I see this sort of take from a lot of people and I always tell them to do the sa...

FlyingLawnmower · 2024-12-16T21:54:51 1734386091

https://chatgpt.com/share/6760a122-0ec4-8008-8b72-3e950f0288...

My first try with o1. Seems right to me…what does this teach us about LLMs :)?

ktxyznvda · 2024-12-16T22:23:59 1734387839

Let's ask for 3 goats then. And how much developing o1 cost, how much another version will cost? X billions of dollars per goat is not really a good scaling when any number of goats or cabbages can exist.

hviniciusg · 2024-12-16T21:54:04 1734386044

emmmmm... i think your argument is not valid any more:

https://chatgpt.com/c/6760a0a0-fa34-800c-9ef4-78c76c71e03b

pockmarked19 · 2024-12-16T22:45:49 1734389149

Seems like they caught up because I have posted this before including in chatGPT. All that means is you have to change it up slightly.

Unfortunately “change it up slightly” is not good enough for people to do anything with, and anything more specific just trains the LLM eventually so it stops proving the point.

I cannot load this link though.

mofeien · 2024-12-16T23:52:15 1734393135

It also means that you should update your belief about the reasoning capabilities of LLMs at least slightly. if disconfirming evidence doesn’t shake your beliefs at all, you dont really have beliefs, you have an ideology.

zahlman · 2024-12-17T03:33:31 1734406411

The observation here is far too easily explained in other ways to be considered particularly strong evidence.

Memorizing a solution to a classic brainteaser is not the same as having the reasoning skills needed to solve it. Finding out separate solutions for related problems might allow someone to pattern-match, but not to understand. This is about as true for humans as for LLMs. Lots of people ace their courses, even at university level, while being left with questions that demonstrate a stunning lack of comprehension.

isx726552 · 2024-12-17T17:42:48 1734457368

Or it just means anything shared on the internet gets RLHF’d / special cased.

It’s been clear for a long time that the major vendors have been watching online chatter and tidying up well-known edge cases by hand. If you have a test that works, it will keep working as long as you don’t share it widely enough to get their attention.

irunmyownemail · 2024-12-17T02:00:21 1734400821

"It also means that you should update your belief about the reasoning capabilities of LLMs at least slightly."

AI, LLM, ML - have no reasoning ability, they're not human, they are software machines, not people. People reason, machines calculate and imitate, they do not reason.

atemerev · 2024-12-17T11:37:41 1734435461

People are just analog neural circuits. They are not magic. Human brain is one physical implementation of intelligence. It is not the only one possible.

tokioyoyo · 2024-12-16T23:37:00 1734392220

I don't want to sound like a prick, but this is generally what people mean when they say "things are moving very fast due to amount of investments". 6 months old beliefs can be scrapped due to new models, research and etc.

fabianholzer · 2024-12-17T07:45:58 1734421558

Where "etc." includes lots of engineers who are polishing up the frontend to the model which translates text to tokens and vice versa, so that well-known demonstrations of the lack of reasoning in the model cannot be surfaced as easily as before? What reason is there to believe that the generated output, while often impressive, is based on something that resembles understanding, something that can be relied on?

cdfuller · 2024-12-16T21:31:58 1734384718

o1 had no issues solving this.

https://chatgpt.com/share/67609bca-dd08-8004-ba27-0f010afc12...