The model is much smaller. Compute in China should be more expensive considering...

minimaxir · 2025-01-26T22:46:51 1737931611

No one knows the size of o1. The only hint was a paper that suggested it was 200B parameters.

Meanwhile, DeepSeek R1 is known to be 671B parameters because it is open-source.

sigmoid10 · 2025-01-26T23:02:28 1737932548

R1 is a mixture model with only 37B active params. So while it's definitely expensive to train, it's rather light on compute during inference. All you really need is lots of memory.

portaouflop · 2025-01-26T23:28:58 1737934138

Electricity prices in China are about half of what they are in the US, I expect the rest is similar.