Hacker News new | past | comments | ask | show | jobs | submit login

In figure 1 bottom-right they show how the correct answers are being found later as the complexity goes higher. In the description they even state that in false responses the LRM often focusses on a wrong answer early and then runs out of tokens before being able to self-correct. This seems obvious and indicates that it’s simply a matter of scaling (bigger token budget would lead better abilities for complexer tasks). Am I missing something?





Yes, the rest of the paper.



Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: