Consistency is a strange criteria seeing as humans aren't very consistent either...

danaris · 2025-06-09T11:20:49 1749468049

This is a common category of error people commit when talking about LLMs.

"True, LLMs can't do X, but a lot of people don't do X well either!"

The problem is, when you say humans have trouble with X, what you mean is that human brains are fully capable of X, but sometimes they do, indeed, make mistakes. Or that some humans haven't trained their faculties for X very well, or whatever.

But LLMs are fundamentally, completely, incapable of X. It is not something that can be a result of their processes.

These things are not comparable.

So, to your specific point: When an LLM is inconsistent, it is because it is, at its root, a statistical engine generating plausible next tokens, with no semantic understanding of the underlying data. When a human is inconsistent, it is because they got distracted, didn't learn enough about this particular subject, or otherwise made a mistake that they can, if their attention is drawn to it, recognize and correct.

LLMs cannot. They can only be told they made a mistake, which prompts them to try again (because that's the pattern that has been trained into them for what happens when told they made a mistake). But their next try won't have any better odds of being correct than their previous one.

hackinthebochs · 2025-06-09T14:19:05 1749478745

>But LLMs are fundamentally, completely, incapable of X. It is not something that can be a result of their processes.

This is the very point of contention. You don't get to just assume it.

> it is because it is, at its root, a statistical engine generating plausible next tokens, with no semantic understanding of the underlying data.

Another highly contentious point you are just outright assuming. LLMs are modelling the world, not just "predicting the next token". Some examples here[1][2][3]. Anyone claiming otherwise at this point is not arguing in good faith. It's interesting how the people with the strongest opinions about LLMs don't seem to understand them.

[1] https://arxiv.org/abs/2405.15943

[2] https://x.com/OwainEvans_UK/status/1894436637054214509

[3] https://www.anthropic.com/research/tracing-thoughts-language...

danaris · 2025-06-09T15:22:46 1749482566

OK, sure; there is some evidence potentially showing that LLMs are constructing a world model of some sort.

This is, however, a distraction from the point, which is that you were trying to make claims that the described lack of consistency in LLMs shouldn't be considered a problem because "humans aren't very consistent either."

Humans are perfectly capable of being consistent when they choose to be. Human variability and fallibility cannot be used to handwave away lack of fundamental ability in LLMs. Especially when that lack of fundamental ability is on empirical display.

I still hold that LLMs cannot be consistent, just as TheOtherHobbes describes, and you have done nothing to refute that.

Address the actual point, or it becomes clear that you are the one arguing in bad faith.

hackinthebochs · 2025-06-09T18:06:53 1749492413

You are misrepresenting the point of contention. The question is whether LLMs lack of consistency undermines the claim that they "understand" in some relevant sense. But arguing that lack of consistency is a defeater for understanding is itself undermined by noting that humans are inconsistent but do in fact understand things. It's as simple as that.

If you want to alter the argument by saying humans can engage in focused effort to reach some requisite level of consistency for understanding, you have to actually make that argument. It's not at all obvious that focused effort is required for understanding or that a lack of focused effort undermines understanding.

You also need to content with the fact that LLMs aren't really a single entity, but are a collection of personas, and what you get and its capabilities do depend on how you prompt it to a large degree. Even if the entity as a whole is inconsistent between prompts, the right subset might very well be reliably consistent. There's also the fact of the temperature setting that artificially injects randomness into the LLMs output. An LLM itself is entirely deterministic. It's not at all obvious how consistency relates to LLM understanding.

Feel free to do some conceptual work to make an argument; I'm happy to engage with it. What I'm tired of are these half-assed claims and incredulity that people don't take them as obviously true.