So the author is in a clear conflict of interest with the contents of the blog because he's an employee of Anthropic. But regarding this "blog", showing the graph where OpenAI compares "frontier" models and shows gpt-4o vs o3-high is just disingenuous, o1 vs o3 would have been a closer fight between "frontier" models. Also today I learned that there are people paid to benchmark AI models in terms of how close they are to "human" level, apparently even "expert" level whatever that means. I'm not a LLM hater by any means, but I can confidently say that they aren't experts in any fields.