They perform similarly on benchmarks, which can be fudged to arbitrarily high nu...

deaux · 2025-10-28T14:52:03 1761663123

General data: hundreds of billions of tokens per week are running through Deepseek, Qwen, GLM models solely by those users going through OpenRouter. People aren't doing that for laughs, or "non-real-world use", that's all for work and/or prod. If you look at the market share graph, at the start of the year the big 3 OpenAI/Anthropic/Google had 72% market share on there. Now it's 45%. And this isn't just because of Grok, before that got big they'd already slowly fallen to 58%.

Anecdata: our product is using a number of these models in production.

[0] https://openrouter.ai/rankings

energy123 · 2025-10-28T15:45:24 1761666324

Because it's significantly cheaper. It's on the frontier at the price it's being offered, but they're not competitive in the high intelligence & high cost quadrant.

deaux · 2025-10-28T16:59:16 1761670756

Being the number one in price vs quality, or size vs quality, is incredibly impressive, as the quality is clearly one that's very useful in "real-world usage". If you don't find that impressive there's not much to say.

energy123 · 2025-10-28T17:06:16 1761671176

If it was on the cost vs quality frontier I would find it impressive, but it's not a marker of innovation to be on the price vs quality frontier, it's a marker of business strategy

deaux · 2025-10-28T18:04:04 1761674644

But it is on the cost vs quality frontier. The OpenRouter prices are all from mainly US(!) companies self-hosting and providing these models for inference. They're absolutely not all subsidizing it to death. This isn't Chinese subsidies at play, far from it.

Ironically, I'll bet you $500 that OpenAI and Anthropic's models are far more subsidized. We can be almost sure about this, given the losses that they post, and the above fact. These providers are effectively hardware plays, they can't just subsidize at scale and they're a commodity.

On top of that I also mentioned size vs quality, where they're also frontier. Size ≈ cost.

senordevnyc · 2025-10-29T01:41:30 1761702090

Honestly though, hundreds of billions of tokens per week really isn't that much. My tiny little profitable SaaS business that can't even support my family yet is doing 10-20 billion tokens per month on Gemini Flash 2.5.

deaux · 2025-10-30T07:00:22 1761807622

Looks like over the last month just Deepseek, Qwen and Z-AI did about 2.8 trillion tokens, given your metric the equivalent to about 187 tiny little profitable SaaS businesses, and that's only those who go through OpenRouter. To me that's very significant.

Also, congrats on the traction ! Being profitable enough to support a family is 95% area-CoL and family size so not sure about that one, but if you're doing that many tokens you've clearly got a good number of active users. We're at a similar point but only 100-200 million tokens per month, strictly B2C app though so that might explain it, tends to be less token heavy.

2.5 Flash is still fantastic especially if you're really input heavy, we use it too for many things, but we've found several open weights models to have better price/quality for certain tasks. It's nice that 2.5 Flash is fast but then speed is most important for longer outputs and for those Flash is relatively expensive. DeepSeek v3.1 is all-around cheaper, for one example.

senordevnyc · 2025-10-30T18:31:35 1761849095

Google just said yesterday that they're doing 7 billion tokens per minute for their customers via API. Crazy.

Thanks for the kudos, it's going well so far. But I'm in NYC and have kids, so...the bar is high :)