Hacker News new | past | comments | ask | show | jobs | submit login

Statements like these are useless without sharing exactly all the models you've tried. Sonnet beats O1 Pro Mode for example? Not in my experience, but I haven't tried the latest Sonnet versions, only the one before, so wouldn't claim O1 Pro Mode beats everything out there.

Besides, it's so heavily context-dependent that you really need your own private benchmarks to make head or tails out of this whole thing.






Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: