Exactly, swyx. Any flat rate pricing plan is effectively a bet against the future. It's a grab for engineers that's subsidised. Now, the problem is that GPUs are expensive; they are a costly resource to use. Inferencing is expensive.
So what happens is inevitable:
- Wild promises of unlimited usage and consumers feeling tricked when the impossible is impossible to deliver (Cursor pricing changes).
- Quasi-unlimited usage with rate-caps, but the models get quantised to all hell? [search Twitter for folks reporting Claude feels dumber around/near outages].
- Engineers sharing tools and techniques on how to squeeze pounds out of a flat-rate plan (original post), which results in more power users doing that, which puts more pressure on margins.
So what happens is inevitable:
- Wild promises of unlimited usage and consumers feeling tricked when the impossible is impossible to deliver (Cursor pricing changes).
- Quasi-unlimited usage with rate-caps, but the models get quantised to all hell? [search Twitter for folks reporting Claude feels dumber around/near outages].
- Engineers sharing tools and techniques on how to squeeze pounds out of a flat-rate plan (original post), which results in more power users doing that, which puts more pressure on margins.
In goose meme format, "What are the margins?"
https://x.com/GeoffreyHuntley/status/1945636266009399414