Why is the nonsense paragraph still grammatical? I thought the tokens were being...

Lerc · on Feb 22, 2024

I'm wondering if it has something to do with the number range of the values being chosen.

For example if a configuration ended up using a fp16/bfloat16 to store an index into the token table (byte pair encoded) then you would get errors only when the tokens had larger numbers.

The text

"Keep it at the dial, in right on Pitch."

comes to 19999 433 520 279 28960 11 304 1314 389 46676 13

Of which only Keep, dial, and Pitch would be large enough to be susceptible to precision errors. That would enable it to generate valid enough sentence structure.

corysama · on Feb 22, 2024

There’s all kinds of “random”. AFAIK, ChatGPT finds the set of most-likely next word, sorted by likeliness. Then, selects from those, more likely the more likely ones, varying away from the top picks in proportion to the “temperature” parameter.

So, even if the chat goes astray, the sequence will continue on a path that feels natural because it is the natural way to go when going astray.