I'm wondering if it has something to do with the number range of the values being chosen.
For example if a configuration ended up using a fp16/bfloat16 to store an index into the token table (byte pair encoded) then you would get errors only when the tokens had larger numbers.
The text
"Keep it at the dial, in right on Pitch."
comes to
19999 433 520 279 28960 11 304 1314 389 46676 13
Of which only Keep, dial, and Pitch would be large enough to be susceptible to precision errors. That would enable it to generate valid enough sentence structure.
There’s all kinds of “random”. AFAIK, ChatGPT finds the set of most-likely next word, sorted by likeliness. Then, selects from those, more likely the more likely ones, varying away from the top picks in proportion to the “temperature” parameter.
So, even if the chat goes astray, the sequence will continue on a path that feels natural because it is the natural way to go when going astray.