I don't understand why this comes as a surprise to a lot of people. Underlying all this is sequences of text tokens converted to semantic vectors, with some positional encoding, and then run through matrix multiplications to compute the probability of the next token. The probability is a function of all the text corpuses the model has previously consumed. You can run this process multiple times, chaining one output into the next one's input, but reason as humans know is unlikely to emerge.