More

qeternity · 2025-09-25T21:59:16 1758837556

> Are they trying to be a full cloud platform like everyone else?

Yes.

qeternity · 2025-08-23T16:25:30 1755966330

Second line of the post:

> The main objective is to learn writing attention in CUDA C++, since many features are not available in Triton, such as MXFP8 / NVFP4 MMA for sm120.

doctorpangloss · 2025-08-23T23:13:37 1755990817

Yes… I read it. If the feature is missing, why not contribute it instead?

almostgotcaught · 2025-08-23T23:16:08 1755990968

How many PRs do you have landed in Triton that you can just blithely say "contribute it"?

saagarjha · 2025-08-24T02:35:47 1756002947

I mean, you can look at the most recent commit and see that the infrastructure is being built out for this right now (of course OpenAI doesn't care about sm_120, though).

almostgotcaught · 2025-08-24T02:42:38 1756003358

i don't know what this comment has to do with my point that OAI doesn't take commits from randoms, especially for infra code.

doctorpangloss · 2025-08-24T03:11:41 1756005101

By all means, the guy could have written the triton fixes he needs and NOT sent it up stream. It would still make more sense to do that! He’s obviously an expert, and I was sincerely wondering, why bother with the C++ stuff if he already knew the better way, and also has the chops to implement it?

almostgotcaught · 2025-08-24T04:00:02 1756008002

There's an enormous difference between writing kernels and writing compiler infra.

saagarjha · 2025-08-24T03:03:52 1756004632

Yeah they do

qeternity · 2025-08-21T23:09:43 1755817783

FAANG has been replaced by Mag7: Alphabet, Amazon, Apple, Broadcom, Meta, Microsoft, and Nvidia.

w-ll · 2025-08-22T00:40:57 1755823257

Can we switch to BANAMMA, Ba-nam-ma

* Broadcom * Alphabet * Nvidia * Amazon * Meta * Microsoft * Apple

tiu · 2025-08-22T03:32:11 1755833531

Why not bloomberg instead lol

PokestarFan · 2025-08-22T19:19:08 1755890348

Does Broadcom do anything but get hate for their shitty decisions? They are becoming, if they aren't already, the new Oracle.

santaboom · 2025-08-23T05:46:26 1755927986

Lol get out of the echo chamber

Edit: to make this helpful, look at Broadcomm interconnect, switching technology, copackaged optics

qeternity · 2025-08-08T16:49:40 1754671780

> Claude Code Max is affordable with Opus

Because Anthropic is presumably massively subsidizing the usage.

kvirani · 2025-08-08T16:56:11 1754672171

Isn't it all heavily subsidized by VC money at this time?

kingstnap · 2025-08-08T18:54:53 1754679293

The APIs are marginally profitable. You can calculate the lifecycle costs of the open models on clusters in batched inference and figure out its less than than what they charge.

The training and researches are very expensive. The fixed price subscriptions are 100% a sweetheart deal.

adventured · 2025-08-08T17:34:41 1754674481

OpenAI for its part is tracking to $12-$15 billion in annual sales. If they slapped a basic ad model for referring onto what they're already doing, it's an easily profitable enterprise doing $30+ billion in sales next year. Frankly they should have already built and deployed that, it would make their free versions instantly profitable and they could boost usage limits and choke off the competition. It's the very straight-forward path to financially ruining their various weaker competition. Anthropic is Lyft in this scenario (and I say that as a big fan of Claude).

Filligree · 2025-08-08T17:16:07 1754673367

Which doesn’t factor into my immediate decisions.

qeternity · 2025-08-08T16:49:01 1754671741

> but forget cost if we want to compare solely the quality

I think this is the whole reason not to compare it to Opus...

bgirard · 2025-08-08T17:04:02 1754672642

I agree. Opus is cost prohibitive for most longer coding tasks. The increase output doesn't justify the cost.

qeternity · 2025-08-01T17:29:50 1754069390

120B MoE. The 20B is dense.

As far as dense models go, it’s larger than many but Mistral has released multiple 120B dense models, not to mention Llama3 405B.

nivvis · 2025-08-05T18:14:52 1754417692

for posterity, since shown that is it actually MoE

> 21B parameters with 3.6B active parameters

sciencesama · 2025-08-01T17:42:47 1754070167

How much ram do you need to run this !!??

cubefox · 2025-08-01T18:41:17 1754073677

Probably about one byte per weight (parameter) plus a bit extra for the key-value cache (depends on the size of the context window).

int_19h · 2025-08-01T19:32:55 1754076775

You can go below one byte per parameter. 4-bit quantization is fairly popular. It does affect quality - for some models more so than others - but, generally speaking, a 4-bit quantized model is still going to do significantly better than an 8-bit model with 1/2 parameters.

qeternity · 2025-07-29T17:21:46 1753809706

I think this is overselling most professors.

qeternity · 2025-07-28T20:44:44 1753735484

Take the API and assume 24/7 usage (or whatever working hours are). That’s your fixed cost.

It’s more likely that this sum is higher than they want. So really it’s not about predictability.

qeternity · 2025-07-23T16:25:51 1753287951

> I don’t know how you got to the conclusion that they only used SRAM.

Because they are doing 1,500 tokens per second.

qeternity · 2025-07-18T00:55:18 1752800118

> if they result in a greater return.

Greater return than what and to whom?

We already have existing labor markets that are very capable of determining returns.

strken · 2025-07-18T02:03:31 1752804211

> Greater return than what and to whom?

Greater return for the government paying for a UBI, compared to not paying for a UBI.

> We already have existing labor markets that are very capable of determining returns.

I'm not sure I understand how "existing labour markets" are going to solve the three things I listed: education, caregiving, and parents taking time off to look after their kids.

The issue of parents being absent is that it results in negative externalities: crime rate, an alienated society, low literacy rates. The existing labour market is great at placing parents into a job efficiently, but it has absolutely nothing to do with keeping their kids out of prison. Nor should it, really, because externalities are a government-level coordination problem.

When it comes to education, the issue is again a coordination problem. Companies might do some training, but they generally prefer to foist the risk off onto employees, other companies, and governments by hiring people who are already educated. Again, this is a coordination problem, because any individual company that skips training and just hires educated workers directly will be more efficient, but those educated workers have to come from somewhere.

I will concede that it's more efficient not to take care of the elderly. I question whether it is desirable, however.

throaway955 · 2025-07-18T01:18:59 1752801539

those labour markets are in shambles atm for most people who aren't upper middle class

qeternity · 2025-07-24T09:45:43 1753350343

In shambles compared to when? Quality of life is the highest it's ever been across socioeconomic strata. It's just our expectations outpace reality.