Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I see this sort of take from a lot of people and I always tell them to do the same exercise. A cure for baseless fears.

Pick an LLM. Any LLM.

Ask it what the goat river crossing puzzle is. With luck, it will tell you about the puzzle involving a boatman, a goat, some vegetable, and some predator. If it doesn’t, it’s disqualified.

Now ask it to do the same puzzle but with two goats and a cabbage (or whatever vegetable it has chosen).

It will start with the goat. Whereupon the other goat eats the cabbage left with it on the shore.

Hopefully this exercise teaches you something important about LLMs.



https://chatgpt.com/share/6760a122-0ec4-8008-8b72-3e950f0288...

My first try with o1. Seems right to me…what does this teach us about LLMs :)?


Let's ask for 3 goats then. And how much developing o1 cost, how much another version will cost? X billions of dollars per goat is not really a good scaling when any number of goats or cabbages can exist.


emmmmm... i think your argument is not valid any more:

https://chatgpt.com/c/6760a0a0-fa34-800c-9ef4-78c76c71e03b


Seems like they caught up because I have posted this before including in chatGPT. All that means is you have to change it up slightly.

Unfortunately “change it up slightly” is not good enough for people to do anything with, and anything more specific just trains the LLM eventually so it stops proving the point.

I cannot load this link though.


It also means that you should update your belief about the reasoning capabilities of LLMs at least slightly. if disconfirming evidence doesn’t shake your beliefs at all, you dont really have beliefs, you have an ideology.


The observation here is far too easily explained in other ways to be considered particularly strong evidence.

Memorizing a solution to a classic brainteaser is not the same as having the reasoning skills needed to solve it. Finding out separate solutions for related problems might allow someone to pattern-match, but not to understand. This is about as true for humans as for LLMs. Lots of people ace their courses, even at university level, while being left with questions that demonstrate a stunning lack of comprehension.


Or it just means anything shared on the internet gets RLHF’d / special cased.

It’s been clear for a long time that the major vendors have been watching online chatter and tidying up well-known edge cases by hand. If you have a test that works, it will keep working as long as you don’t share it widely enough to get their attention.


"It also means that you should update your belief about the reasoning capabilities of LLMs at least slightly."

AI, LLM, ML - have no reasoning ability, they're not human, they are software machines, not people. People reason, machines calculate and imitate, they do not reason.


People are just analog neural circuits. They are not magic. Human brain is one physical implementation of intelligence. It is not the only one possible.


I don't want to sound like a prick, but this is generally what people mean when they say "things are moving very fast due to amount of investments". 6 months old beliefs can be scrapped due to new models, research and etc.


Where "etc." includes lots of engineers who are polishing up the frontend to the model which translates text to tokens and vice versa, so that well-known demonstrations of the lack of reasoning in the model cannot be surfaced as easily as before? What reason is there to believe that the generated output, while often impressive, is based on something that resembles understanding, something that can be relied on?





Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: