I first noticed this with Watson (IBM's language processing system) when it play...

I first noticed this with Watson (IBM's language processing system) when it played Jeopardy!: when it was right, it was spot on and usually faster than the human contestants; but when it was wrong, it was way, way off base.

Part of that has to do with the fact that language is not the same for an LLM as it is for a person. If I say to you the sentence "The cat sat on the mat", that will evoke a picture, at the very least an abstract sketch, in your mind based on prior experience of cats, mats, and the sitting thereupon. Even aphantasic people will be able to map utterances to aspects of their experience in ways that allow them to judge whether something makes sense. A phrase like "colorless green dreams sleep furiously" is arrant nonsense to just about everybody.

But LLMs have no experiences. Utterances are tokens with statistical information about how they relate to one another. Nodes in a graph with weighted edges or something. If you say to an LLM "Explain to me how colorless green dreams can sleep furiously", it might respond with "Certainly! Dreams come in a variety of colors, including green and colorless..."

I've always found Searle's argument in the Chinese Room thought experiment fascinating, if wrong; my traditional response to it was "the man in the room does not understand Chinese, but the algorithm he's running might". I've been revisiting this thought experiment recently, and think Searle may have been less wrong than I'd first guessed. At a minimum, we can say that we do not yet have an algorithm that can understand Chinese (or English) the way we understand Chinese (or English).