Hacker News new | past | comments | ask | show | jobs | submit login

Does the linked study actually check that the LLM solves the task correctly, or just that the code runs and terminates without errors? I'm bad at reading, but the paper feels like it's saying the latter, which doesn't seem that useful.





Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: