> but humans cant do it either This argument is tired as it keeps getting repeat...

thomasahle · 2025-06-07T18:13:41 1749320021

> Humans invented machines because they could not do certain things.

If your disappointment is that the LLM didn't invent a computer to solve the problem, maybe you need to give it access to physical tools, robots, labs etc.

mrbungie · 2025-06-07T22:39:33 1749335973

Nah, even if we follow such a weak "argument" the fact is that, ironically, the evidence shown in this and other papers point towards the idea that even if LRMs did have access to physical tools, robots labs, etc*, they probably would not be able to harness them properly. So even if we had an API-first world (i.e. every object and subject in the world can be mediated via a MCP server), they wouldn't be able to perform as well as we hope.

Sure, humans may fail doing a 20 digit multiplication problems but I don't think that's relevant. Most aligned, educated and well incentivized humans (such as the ones building and handling labs) will follow complex and probably ill-defined instructions correctly and predictably, instructions harder to follow and interpret than an exact Towers of Hanoi solving algorithm. Don't misinterpret me, human errors do happen in those contexts because, well, we're talking about humans, but not as catastrophically as the errors committed by LRMs in this paper.

I'm kind of tired of people comparing humans to machines in such simple and dishonest ways. Such thoughts pollute the AI field.

*In this case for some of the problems the LRMs were given an exact algorithm to follow, and they didn't. I wouldn't keep my hopes up for an LRM handling a full physical laboratory/factory.

thomasahle · 2025-06-07T23:40:46 1749339646

> Don't misinterpret me, human errors do happen in those contexts because, well, we're talking about humans, but not as catastrophically as the errors committed by LRMs in this paper.

If your argument is just that LRMs are more noisy and error prone in their reasoning, then I don't disagree.

> I'm kind of tired of people comparing humans to machines in such simple and dishonest ways.

The issue is people who say "see, the AI makes mistakes at very complex reasoning problems, so their 'thinking is an illusion'". That's the title of the paper.

This mistake comes not from people "comparing humans to machines", but from people fundamentally misunderstanding what thinking is. If thinking is what humans do, then errors are expected.

There is this armchair philosophical idea, that a human can simulate any turning machine and thus our reasoning is "maxomally general", and anything that can't do this is not general intelligence. But this is the complete opposite of reality. In our world, anything we know that can perfectly simulate a turning machine is not general intelligence, and vice versa.

mrbungie · 2025-06-08T00:29:17 1749342557

> The issue is people who say "see, the AI makes mistakes at very complex reasoning problems, so their 'thinking is an illusion'". That's the title of the paper.

That's not what the paper proposes (i.e. it commits errors => thinking is an illusion). It in fact looks at the failures modes and then it argues that due to HOW they fail and in which contexts/conditions, that their thinking may be "illusory" (not that the word illusory matters that much, papers of this calibre always strive for interesting sounding titles). Hell, they even gave the exact algo to the LRM, it probably can't get more enabling than that.

Humans are lossy thinkers and error-prone biological "machines", but an educated+aligned+incentivized one shouldn't have problems following complex instructions/algos (not in a no-errors way, but rather, in a self-correcting way); we thought that LRMs did that too, but the paper shows how they even start using less "thinking" tokens after a complexity threshold and that's terribly worrisome, akin to someone getting frustrated and stopping thinking after a problem gets too difficult which goes contrary to the idea that these machines can run laboratories by themselves. It is not the last nail in the coffin because more evidence is needed as always, but when taken into account with other papers, it points towards the limitations of LLMs/LRMs and how those limitations may not be solvable with more compute/tokens, but rather exploring new paradigms (long due in my opinion, the industry usually forces a paradigm as panacea during hype cicles in the name of hypergrowth/sales).

In short the argument you say the paper and posters ITT make is very different from what they are actually saying, so beware of the logical leap you are making.

> There is this armchair philosophical idea, that a human can simulate any turning machine and thus our reasoning is "maxomally general", and anything that can't do this is not general intelligence. But this is the complete opposite of reality. In our world, anything we know that can perfectly simulate a turning machine is not general intelligence, and vice versa.

That's typical goalpoast moving and happens in both ways when talking about "general intelligence" as you say, since the dawn of AI and the first neural networks. I'm not following why this is relevant for the discussion though.