Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Reduced memory consumption for context perhaps, but hidden state is different from weights. I don't think this would improve the model's capability per model parameter (but as with everything with ML, I wouldn't bet against it until it's been tested)


Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: