> In contrast, this article, i.e., the paper it discusses, is is based on what has happened so far.
What happened in 2023 and 2024 actually
Nitpicky but it's worth noting that last year's AI capabilities are not the April 2025 AI capabilities and definitely won't be the December 2025 capabilities.
It's using deprecated/replaced technology to make a statement, that is not forward projecting. I'm struggling to see the purpose. It's like announcing that the sun is still shining at 7pm, no?
I feel like model improvement is severely overstated by the benchmarks and the last release cycle basically made no difference to my use cases. If you gave me Claude 3.5 and 3.7 I couldn't really tell the difference. OpenAI models feel like they are regressing, and LLAMA 4 regressed even on benchmarks.
And the hype was insane in 2023 already - it's useful to compare actual outcomes vs historic hype to gauge how credible the hype sellers are.
That's interesting. I think there's been some pretty significant improvements in the rate of hallucinations and accuracy of the models, especially when it comes to rule following. Perhaps the biggest improvement though is in the size of context windows which are huge compared to this time last year.
Maybe progress over the last 2-3 months is hard to see, but progress over the last 6 is very clear.
What happened in 2023 and 2024 actually
Nitpicky but it's worth noting that last year's AI capabilities are not the April 2025 AI capabilities and definitely won't be the December 2025 capabilities.
It's using deprecated/replaced technology to make a statement, that is not forward projecting. I'm struggling to see the purpose. It's like announcing that the sun is still shining at 7pm, no?