what? nobody looks at those benchmarks, you use whatever works for your task, in... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		PunchTornado 4 months ago \| parent \| context \| favorite \| on: Grok 4 Launch [video] what? nobody looks at those benchmarks, you use whatever works for your task, in most cases either gemini or claude. those benchmarks don't mean anything as models overfit on them.

esafak 4 months ago [–]

Come on, the benchmarks do mean something, even if companies overfit them. Models are indisputably improving together with their benchmark scores.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact