Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I think another argument is that the CoT is simply unrolling the recurrent loop that this method uses, and doing an unembedding -> embedding -> unembedding during the decoding process.

So at best, using a recurrent loop is only saving you from doing the embedding -> unembedding at each token which is relatively small compared with the height of the decoder blocks.



Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: