In the article, it describes an internal state of the model that is preserved be...

SkyBelow · 2025-03-27T20:02:12 1743105732

While there are numerous neural network models, the ones I recall the details of are trained to generate the next word. There is no training them to hold some more abstract 'thought' as it is running. Simpler models don't have the possibility. The more complex models do retain knowledge between each pass and aren't entirely relying upon the input/output to be fed back into them, but that internal state is rarely what is targeted in training.

As for humans, part of our brain is trained to think only a few words in advanced. Maybe not exactly one, but only a small number. This is specifically trained based on our time listening and reading information presented in that linear fashion and is why garden path sentences throw us off. We can disengage that part of our brain, and we must when we want to process something like a garden path sentence, but that's part of the differences between a neural network that is working only as data passes through the weights and our mind which doesn't ever stop even as well sleep and external input is (mostly) cut off. An AI that runs constantly like that would seem a fundamentally different model than the current AI we use.

wuliwong · 2025-03-27T18:25:45 1743099945

Bad analogy, an LLM can output a block of text all at once and it wouldn't impact the user's ability to understand it. If people spoke all the words in a sentence at the same time, it would not be decipherable. Even writing doesn't yield a good analogy, a human writing physically has to write one letter at a time. An LLM does not have that limitation.

sdwr · 2025-03-27T18:40:45 1743100845

The point I'm trying to make is that "each word following the last" is a limitation of the medium, not the speaker.

Language expects/requires words in order. Both people and LLMs produce that.

If you want to get into the nitty-gritty, people are perfectly capable of doing multiple things simultaneously as well, using:

- interrupts to handle task-switching (simulated multitasking)

- independent subconscious actions (real multitasking)

- superpositions of multiple goals (??)

sroussey · 2025-03-27T18:35:56 1743100556

Some people don’t even do that!