Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Yeah, the heavily distilled models are very bad with hallucinations. I think they use them to cover for decreased capacity. A 1B model will happily attempt the same complex coding tasks as a 1T model but the hard parts will be pushed into an API call that doesn't exist, lol.


Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: