Hacker Newsnew | past | comments | ask | show | jobs | submit | qeternity's commentslogin

> Are they trying to be a full cloud platform like everyone else?

Yes.


Second line of the post:

> The main objective is to learn writing attention in CUDA C++, since many features are not available in Triton, such as MXFP8 / NVFP4 MMA for sm120.


Yes… I read it. If the feature is missing, why not contribute it instead?


How many PRs do you have landed in Triton that you can just blithely say "contribute it"?


I mean, you can look at the most recent commit and see that the infrastructure is being built out for this right now (of course OpenAI doesn't care about sm_120, though).


i don't know what this comment has to do with my point that OAI doesn't take commits from randoms, especially for infra code.


By all means, the guy could have written the triton fixes he needs and NOT sent it up stream. It would still make more sense to do that! He’s obviously an expert, and I was sincerely wondering, why bother with the C++ stuff if he already knew the better way, and also has the chops to implement it?


There's an enormous difference between writing kernels and writing compiler infra.


Yeah they do


FAANG has been replaced by Mag7: Alphabet, Amazon, Apple, Broadcom, Meta, Microsoft, and Nvidia.


Can we switch to BANAMMA, Ba-nam-ma

* Broadcom * Alphabet * Nvidia * Amazon * Meta * Microsoft * Apple


Why not bloomberg instead lol


Does Broadcom do anything but get hate for their shitty decisions? They are becoming, if they aren't already, the new Oracle.


Lol get out of the echo chamber

Edit: to make this helpful, look at Broadcomm interconnect, switching technology, copackaged optics


> Claude Code Max is affordable with Opus

Because Anthropic is presumably massively subsidizing the usage.


Isn't it all heavily subsidized by VC money at this time?


The APIs are marginally profitable. You can calculate the lifecycle costs of the open models on clusters in batched inference and figure out its less than than what they charge.

The training and researches are very expensive. The fixed price subscriptions are 100% a sweetheart deal.


OpenAI for its part is tracking to $12-$15 billion in annual sales. If they slapped a basic ad model for referring onto what they're already doing, it's an easily profitable enterprise doing $30+ billion in sales next year. Frankly they should have already built and deployed that, it would make their free versions instantly profitable and they could boost usage limits and choke off the competition. It's the very straight-forward path to financially ruining their various weaker competition. Anthropic is Lyft in this scenario (and I say that as a big fan of Claude).


Which doesn’t factor into my immediate decisions.


> but forget cost if we want to compare solely the quality

I think this is the whole reason not to compare it to Opus...


I agree. Opus is cost prohibitive for most longer coding tasks. The increase output doesn't justify the cost.


120B MoE. The 20B is dense.

As far as dense models go, it’s larger than many but Mistral has released multiple 120B dense models, not to mention Llama3 405B.


for posterity, since shown that is it actually MoE

> 21B parameters with 3.6B active parameters


How much ram do you need to run this !!??


Probably about one byte per weight (parameter) plus a bit extra for the key-value cache (depends on the size of the context window).


You can go below one byte per parameter. 4-bit quantization is fairly popular. It does affect quality - for some models more so than others - but, generally speaking, a 4-bit quantized model is still going to do significantly better than an 8-bit model with 1/2 parameters.


I think this is overselling most professors.


Take the API and assume 24/7 usage (or whatever working hours are). That’s your fixed cost.

It’s more likely that this sum is higher than they want. So really it’s not about predictability.


> I don’t know how you got to the conclusion that they only used SRAM.

Because they are doing 1,500 tokens per second.


> if they result in a greater return.

Greater return than what and to whom?

We already have existing labor markets that are very capable of determining returns.


> Greater return than what and to whom?

Greater return for the government paying for a UBI, compared to not paying for a UBI.

> We already have existing labor markets that are very capable of determining returns.

I'm not sure I understand how "existing labour markets" are going to solve the three things I listed: education, caregiving, and parents taking time off to look after their kids.

The issue of parents being absent is that it results in negative externalities: crime rate, an alienated society, low literacy rates. The existing labour market is great at placing parents into a job efficiently, but it has absolutely nothing to do with keeping their kids out of prison. Nor should it, really, because externalities are a government-level coordination problem.

When it comes to education, the issue is again a coordination problem. Companies might do some training, but they generally prefer to foist the risk off onto employees, other companies, and governments by hiring people who are already educated. Again, this is a coordination problem, because any individual company that skips training and just hires educated workers directly will be more efficient, but those educated workers have to come from somewhere.

I will concede that it's more efficient not to take care of the elderly. I question whether it is desirable, however.


those labour markets are in shambles atm for most people who aren't upper middle class


In shambles compared to when? Quality of life is the highest it's ever been across socioeconomic strata. It's just our expectations outpace reality.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: