I was thinking at one point if all these companies just hit a wall in performanc...

merksittich · 2025-05-27T13:47:16 1748353636

Speaking of performance wall: The Claude 4 results were added to the Aider LLM Leaderboard [0] yesterday. Opus 4 is clearly below Gemini 2.5 Pro at almost twice the price. Sonnet 4 fares worse than Sonnet 3.7, with the thinking version of Sonnet 4 being somewhat cheaper than its 3.7 counterpart.

[0] https://aider.chat/docs/leaderboards/

smusamashah · 2025-05-27T20:08:48 1748376528

There might be 4.1 soon to make up for it's shortcomings.

antupis · 2025-05-27T10:53:32 1748343212

I think we already hit somekind of performance wall begin of this year. It feels that models are now balancing between rule following and agentic case and general stuff. eg Claude 4 sonet just feels better in Cursor and follows rules very well, and same time it gets equal or worse scores in benchmark against 3.7 Sonet.

Flemlo · 2025-05-27T09:18:23 1748337503

But that's part of the human based reinforcement learning.

It now just happens on a way bigger level because now it's actually worth while to do.

That's one of the core beauties of the AI Ara