Yes, the quality of the models is increasing at a slower rate and the race will ...

Yes, the quality of the models is increasing at a slower rate and the race will transition to performance and efficiency.

This is good for self hosters and devs who will be able to run near SOTA models like QwQ locally. I’m near the point where I’m going to cancel my ChatGPT Plus and Claude subscription.

If you’re not already trying to self host, build your own local agents and build your own MCPs/Tools I would encourage you to try it (simple stack: ollama, pydanticAI, fastmcp, QwQ 32B, Llama 3.2 3B). If you don’t have a fancy GPU or M1+ try out QwQ on Groq or Flash 2.0 Lite with the Gemini API, it’s super cheap and fast and they are basically equivalent (if not better) than the ChatGPT you were paying for 16 months ago.