Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I'm sure some of the people working at Theranos thought there legitimately was a revolutionary blood-test machine.

The presence of a person who wants SWE-bench to have honest results and takes it seriously does not mean the results are free of perverse incentives, nor that everyone is behaving just as honestly.



When Swe-Bench was new in 2023, it was — with all due respect — a bit of a niche benchmark in LLM research. LLMs were so incredibly useless at solving these tasks that I think you could find a bit more empathy for the original academic authors. I don’t think the Theranos example applies. Even the flawed benchmark was good enough to get us from ~GPT4 to Claude 4‘s coding ability.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: