Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The real answer is that nobody trusts their automated evals enough to be confident that any given automatically-trained release actually improves performance, even if eval scores go up. So for now everyone batches up updates and vibe-checks them before rolling them out.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: