Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I'm not surprised. People really thought the models just kept getting better and better?


The models are getting better and better.


That's expected. No one will release a worse model.


Not a cheaper one, or better in some ways, or lower latency, etc?


They do that too but right now it is an arms race as well.


Maybe. How would I know?


...even if the agent did "cheat", I think that having the capacity to figure out that it was being evaluated, find the repo containing the logic of that evaluation, and find the expected solution to the problem it faced... is "better" than anything that the models were able to do a couple years ago.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: