It could be argued that "thinking" / CoT in latent space abstracts away the lang...

janalsncm · 2025-02-11T20:59:18 1739307558

I think another argument is that the CoT is simply unrolling the recurrent loop that this method uses, and doing an unembedding -> embedding -> unembedding during the decoding process.

So at best, using a recurrent loop is only saving you from doing the embedding -> unembedding at each token which is relatively small compared with the height of the decoder blocks.