Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

But while keeping the data constant, adding more and more parameters is a strategy that works, so what gives? Are the functions getting somehow regularized during training so effectively you could get away with fewer parameters, it's just that we don't have the right model just yet?


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: