While the output is a single word (more precisely, token), the internal activati...

kazinator · 2025-03-28T01:50:29 1743126629

But the activations at some point lead to a 100% confidence that the right word has been identified for the current slot. That is output, and it proceeds to the next one.

Like for a 500 token response, at some point it was certain that the first 25 words are the right ones, such that it won't have to take any of them back when eventually calculating the last 25.

bonoboTP · 2025-03-28T03:20:03 1743132003

This is true, but it doesn't mean that it decided those first 25 without "considering" whether those 25 can be afterwards continued meaningfully with further 25. It does have some internal "lookahead" and generates things that "lead" somewhere. The rhyming example from the article is a great choice to illustrate this.