Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

When people say LLMs can't do X, I like to try it.

    Q: Complete 3 by generating new knowledge:
    1. today is warm
    2. cats likes warm temperatures
    3.
A: Therefore, a cat is likely to be enjoying the weather today.

Q: does the operation to create new knowledge you did have a specific name?

A: ... Deductive Reasoning

Q: does the operation also have a Latin name?

A: ... So, to be precise, you used a syllogismus (syllogism) that takes the form of Modus Ponens to make a deductio (deduction).

https://aistudio.google.com/app/prompts/1LbEGRnzTyk-2IDdn53t...

People then say "of course it could do that, it just pattern matched a Logic text book. I meant in a real example, not an artificially constructed one like this one. In a complex scenario LLMs obviously can't do Modus Ponens.




I do not know whether the state of the art is able to reason or not. The textbook example you gave is admittedly not very interesting. What you are hearing from people is that parroting is not reasoning, which is true.

I wonder if the state of the art can reason its way through the following:

"Adam can count to 14000. Can Adam count to 13500?"

The response needs to be affirmative for every X1 and X2 such that X2 <= X1. That is reasoning. Anything else is not reasoning.

The response when X2 > X1 is less interesting. But, as a human it might be "Maybe, if Adam has time" or "Likely, since counting up to any number uses the same algorithm" or "I don't know".

Seems ChatGPT can cope with this. Other examples are easy to come up with, too. There must be benchmarks for this.

Input to ChatGPT:

"Adam can lift 1000 pounds of steel. Can Adam lift 1000 pounds of feathers?"

Output from ChatGPT:

"1,000 pounds of feathers would be much easier for Adam to lift compared to 1,000 pounds of steel, because feathers are much lighter and less dense."

So, maybe not there yet...


> "Adam can lift 1000 pounds of steel. Can Adam lift 1000 pounds of feathers?"

Worked for me:

https://chatgpt.com/share/6844813a-6e4c-8006-b560-c0be223eeb...

gemma3-27b, a small model, had an interesting take:

> This is a classic trick question!

> While Adam can lift 1000 pounds, no, he likely cannot lift 1000 pounds of feathers.

> Volume: Feathers take up a huge amount of space for their weight. 1000 pounds of feathers would be an enormous volume – likely far too large for Adam to even get under, let alone lift. He'd be trying to lift a massive, bulky cloud.

> Practicality: Even if he could somehow get it under a barbell, the feathers would shift and compress, making a secure grip impossible.

> The question plays on our understanding of weight versus volume. It's designed to make you focus on the "1000 pounds" and forget about the practicalities of lifting something so voluminous.

Tried the counting question on the smallest model, gemma-3n-34b, it can run on a smartphone:

> Yes, if Adam can count to 14000, he can definitely count to 13500. Counting to a smaller number is a basic arithmetic operation. 13500 is less than 14000.


Thanks for trying these out :). Highlights the often subtle difference between knowing the answer and deducing the answer. Feathers could be ground into a pulp and condensed, too. I am not trying to be clever, just seems like the response is a canned answer.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: