> The best benchmark is the community vibe in the weeks following a release.
True, just be careful what community you use as a vibe-check. Most of the mainstream/big ones around AI and LLMs basically have influence campaigns run against them, are made of giant hive-minds that all think alike and you need to carefully asses if anything you're reading is true or not, and votes tend to make it even worse.
Claude benchmarks poorly but vibes well. Gemini benchmarks well and vibes well. Grok benchmarks well but vibes poorly.
(yes I know you are gushing with anecdotes, the vibes are simply the approximate color of gray born from the countless black and white remarks.)