> On the other hand, where I remain a skeptic is this constant banging-on that s...

dingnuts · 2025-06-02T16:59:14 1748883554

> Just the ability to speed up exploration and validation based on what a human tells it to do is already enormously useful, depending on how much you can speed up those things, and how accurate it can be.

The big question is: is it useful enough to justify the cost when the VC subsidies go away?

My phone recently offered me Gemini "now for free" and I thought "free for now, you mean. I better not get used to that. They should be required to call it a free trial."

jsnell · 2025-06-02T17:44:22 1748886262

Inference is actually quite cheap. Like, a highly competitive LLM can cost 1/25th of a search query. And it is not due to inference being subsidized by VC money.

It's also getting cheaper all the time. Something like 1000x cheaper in the last two years at the same quality level, and there's not yet any sign of a plateau.

So it'd be quite surprising if the only long-term business model turned out to be subscriptions.

Denzel · 2025-06-02T18:23:37 1748888617

Can you link to any sources that support your claim?

jsnell · 2025-06-02T19:00:29 1748890829

Sure. Here's something I'd written on the subject that I'd left lying in my drafts folder for a month, but I've now published just for you :)

https://www.snellman.net/blog/archive/2025-06-02-llms-are-ch...

It has links to public sources on the pricing of both LLMs and search, and explains why the low inference prices can't be due the inference being subsidized. (And while there are other possible explanations, it includes a calculator for what the compound impact of all of those possible explanations could be.)

Denzel · 2025-06-02T21:49:42 1748900982

Thanks for sharing!

It's worthwhile to note that https://github.com/deepseek-ai/open-infra-index/blob/main/20... shows cost vs. theoretical income. They don't show 80% gross margins and there's probably a reason they don't share their actual gross margin.

OpenAI is the easiest counterexample that proves inference is subsidized right now. They've taken $50B in investment; surpassed 400M WAUs (https://www.reuters.com/technology/artificial-intelligence/o...); lost $5B on $4B in revenue for 2024 (https://finance.yahoo.com/news/openai-thinks-revenue-more-tr...); and project they won't be cash-flow positive until 2029.

Prices would be significantly higher if OpenAI was priced for unit profitability right now.

As for the mega-conglomerates (Google, Meta, Microsoft), GenAI is a loss leader to build platform power. GenAI doesn't need to be unit profitable, it just needs to attract and retain people on their platform, ie you need a Google Cloud account to use Gemini API.

jsnell · 2025-06-02T22:21:03 1748902863

Thanks,

I believe the API prices are not subsidized, and there's an entire section devoted to that. To recap:

1) pure compute providers (rather than companies providing both the model and the compute) can't really gain anything from subsidizing. That market is already commoditized and supply-limited.

2) there is no value to gaining paid API market share -- the market share isn't sticky, and there's no benefit to just getting more usage since the terms of service for all the serious providers promise that the data won't be used for training.

3) we have data from a frontier lab on what the economics of their paid API inference are (but not the economics of other types of usage)

So the API prices set a ceiling on what the actual cost of inference can be. And that ceiling is very low relative to the prices of a comparable (but not identical) non-AI product category.

That's a very distinct case from free APIs and consumer products. The former is being given out for no cost in exchange for data, the latter for data and sticky market share. So unlike paid APIs, the incentives are there.

But given the cost structure of paid APIs, we can tell that it would be trivial for the consumer products to be profitably monetized with ads. They've got a ton of users, and the way users interact with their main product would be almost perfect for advertising.

The reason OpenAI is not making a profit isn't that inference is expensive. It's that they're choosing not to monetize like 95% of their users, despite the unit economics being very lucrative in principle. They're making a loss because for now they can, and for now the only goal of their consumer business is to maximize their growth and consumer mindshare.

If OpenAI needed to make a profit, they would not raise their prices on things being paid for. They'd just need to extract a very modest revenue from their unpaid users. (It's 500M unpaid users. To make $5B/year in revenue from them, you'd need just a $1 ARPU. That's an order of magnitude below what's realistic. Hell, that's lower than the famously hard to monetize Reddit's global ARPU.)

Denzel · 2025-06-03T00:25:06 1748910306

Yes, I read your entire article and that section, hence my response. :)

1) Help me understand what you mean by “pure compute providers” here. Who are the pure compute providers and what are their financials including pricing?

2) I already responded to this - platform power is one compelling value gained from paid API market share.

3) If the frontier lab you’re talking about is DeepSeek, I’ve already responded to this as well, and you didn’t even concede the point that the 80% margin you cited is inaccurate given that it’s based on a “theoretical income”.

jsnell · 2025-06-03T02:01:49 1748916109

1) Any companies that host APIs using open-weights models (LLama, Gemma, Deepseek, etc) in exchange for money. There's a lot of them around, at different scales and different parts of a hosting provider's lifecycle. Check for example the Openrouter page for any open-weights model for hosters of that model with price data.

2) (API) platform power having no value in this space has been demonstrated repeatedly. There are no network effects, because you can't use the user data to improve models. There is no lock-in, as the models are easy to substitute due to how incredibly generic the interface is. There is no loyalty, the users will jump ship instantly when better models are released. There is no purchasing power from having more scale, the primary supplier (Nvidia) isn't giving volume discounts and is actually giving preferential allocations to smaller hosting providers to fragment the market as much as possible.

Did you have some other form of platform power in mind?

3) I did not concede that point because I don't think it's relevant. They provide the exact data for their R1 inference economics:

- The cost per node: a 8*H800 node costs $16/hour=$0.0045/s to run (rental price, so that covers capex + opex).

- The throughput per node: Given their traffic mix, a single node will process 75k/s input tokens and generate 15k/s output tokens.

- Pricing ($0.35/1M input when weighing for cache hit/miss, $2.2/1M output)

- From which it follows that the per-node revenue is $0.35/(1M/75k/s) = $0.026/s for input, and $2.2/(1M/15k/s)=$0.033/s for output. That's $0.06/s in revenue, substantially higher than the cost of revenue.

Like, that just is what the economics of paid R1 inference are (there being V3 in the mix doesn't matter, they're the same parameter count). Inference is really, really cheap both in absolute cost/token terms and relative to the prices people are willing to pay.

Their aggregate margins are different, and we don't know how different, because here too they choose to also provide free service with no ads. But that too is a choice. If they just stopped doing that and rented fewer GPUs, their margins would be very lucrative. (Not as high as the computation suggests since the unpaid traffic allows them to batch more efficiently, but hat's not going to make a 5x difference.)

But fair enough, it might be cleaner to use the straight cost per token data rather than add the indirection of margins. Either way, it seems clear that API pricing is not subsidized.

whilenot-dev · 2025-06-02T20:45:35 1748897135

Just had a quick glance, but I think I found something to add to the Objection!-section of your post:

Brave's Search API is 3$ CPM and includes Web search, Images, Videos, News, Goggles[0]. Anthropic's API is 10$ CPM for Web search (and text only?), excluding any input/output tokens from your model of choice[1], that'd be an additional 15$ CPM, assuming 1KTok per request and Claude Sonnet 4 as a good model, so ~25$ CPM.

So your default "Ratio (Search cost / LLM cost): 25.0x" seems to be more on the 0.12x side of things (Search cost / LLM cost). Mind you, I just flew over everything in 10 mins and have no experience using either API.

[0]: https://brave.com/search/api/

[1]: https://www.anthropic.com/pricing#anthropic-api

diggan · 2025-06-02T17:40:14 1748886014

> The big question is: is it useful enough to justify the cost when the VC subsidies go away?

I won't claim local LLMs as nearly as good as various top models behind paid subscriptions/APIs, but I'm certain I'd be able find a way (for me) of working with them well enough, if the entire paid/hosted ecosystem disappeared over night. Even with models released today.

I think the VC subsidies probably "make stuff happen" faster, and without it we'd see slower progress, but I don't think 100% of the ecosystem would disappear even if 100% of VC funding disappeared. We're bound for another AI winter at one point, and some will surely survive even that :)

butlike · 2025-06-02T19:24:31 1748892271

So isn't the heuristic that if your job is easily digestible by an LLM, you're probably replaceable, but if the strong slowdown factor presents itself, you're probably doing novel work and have job security?

diggan · 2025-06-03T10:34:00 1748946840

> So isn't the heuristic that if your job is easily digestible by an LLM, you're probably replaceable

Yeah, that sounds about right to me. I wasn't talking about wholesale replacement though, but as a tool/augmentation, I'm not very confident an LLM would be able replace a software engineer, but I can definitely see many workflows of a software engineer being sped up, like the exploration and validation process.