Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Surely we do know why - reinforcement learning for reasoning. These systems are trained to generate reasoning steps that led to verified correct conclusions during training. No guarantees how they'll perform on different problems of course, but in relatively narrow closed domains like math and programming, it doesn't seem surprising that when done at scale there are similar enough problems where similar reasoning logic will apply, and it will be successful.


We don't know why that is sufficient to enable the models to develop the capability, and we don't know what they are actually doing under the hood when they employ the capability.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: