Exactly, swyx. Any flat rate pricing plan is effectively a bet against the futur...

Exactly, swyx. Any flat rate pricing plan is effectively a bet against the future. It's a grab for engineers that's subsidised. Now, the problem is that GPUs are expensive; they are a costly resource to use. Inferencing is expensive.

So what happens is inevitable:

- Wild promises of unlimited usage and consumers feeling tricked when the impossible is impossible to deliver (Cursor pricing changes).

- Quasi-unlimited usage with rate-caps, but the models get quantised to all hell? [search Twitter for folks reporting Claude feels dumber around/near outages].

- Engineers sharing tools and techniques on how to squeeze pounds out of a flat-rate plan (original post), which results in more power users doing that, which puts more pressure on margins.

In goose meme format, "What are the margins?"

https://x.com/GeoffreyHuntley/status/1945636266009399414