Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Absolute definitions are weak. They won't settle anything.

We know what we need right now, the next step. That step is a machine that, when it fails, it fails in a human way.

Humans also make mistakes, and hallucinate. But we do it as humans. When a human fails, you think "damn, that's a mistake perhaps me or my friend could have done".

LLMs on the other hand, fail in a weird way. When they hallucinate, they demonstrate how non-human they are.

It has nothing to do with some special kind of interrogator. We must assume the best human interrogator possible. This next step I described work even with the most skeptic human interrogator possible. It also synergizes with the idea of alignment in ways other tests don't.

When that step is reached, humans will or will not figure out another characteristic that makes it evident that "subject X" is a machine and not a human, and a way to test it.

Moving the goalpost is the only way forward. Not all goalpost moves are valid, but the valid next move is a goalpost move. It's kind of obvious.





This makes sense if we're trying to recreate a human mind artifically, but I don't think that's the goal?

There's no reason an equivalent or superior general intelligence needs to be similar to us at all


There's no reason to the idea "superior intelligence". Nobody can say what that means, except by assuming that animal intelligence is the same category as the kind we want and differs from human intelligence in degree rather than qualitatively, and then extrapolating forward from this idea of measuring intelligence on the intelligence meter that we don't have one of.

Besides which we already defined "artificial intelligence" to mean non-intelligence: are we now going to attain "artificial general intelligence" by the same process? Should we add another letter to the acronym, like move on to "genuine artificial general intelligence"?


Is there really no agreement to what intelligence refers to? I've seen it defined as the ability to reach a goal, which was clear to me. Eg. a chess AI with 1500 ELO is more intelligent than one at 1000

That's capability, intelligence can also be how quickly it learned to get to that capability.

Consider the difference in intelligence between a kid who skipped five years of school vs one who was held back a year: if both got the same grade in the end, the one who skipped five years was smarter.


makes sense. Maybe a combination of both would be most accurate - how fast you can learn + what's your peak capability

Looking at it solely on rate of learning has LLMs way smarter than humans already which doesn't seem right to say


> Looking at it solely on rate of learning has LLMs way smarter than humans already which doesn't seem right to say

Sure, but "rate" also has two meanings, both useful, but importantly different: per unit of wall-clock time, and per example.

Transistors are just so much faster than synapses, that computers can (somewhat) compensate for being absolutely terrible by the latter meaning — at least, in cases where there's enough examples for them to learn from.

In cases where the supply of examples is too small (and cannot be enhanced with synthetic data, simulations and so on), state of the art AI models still suck. In cases where there is sufficient data, for example self-play in games of chess and go, the AI can be super-human by a substantial margin.


LLMs are trained on human data, and aimed at perform tasks in human roles. That's the goal.

It is supposed to be super, but superhuman. Able to interact with us.

Which leads us to the Turing Test (also, not a test... "the imitation game" is more of a philosophical exploration on thinking machines).

My comment assumes this is already understood as Turing explained.

If the thing is not human, then there's absolutely no way we can evaluate it. There's no way we can measure it. It becomes an impossible task.


What's wrong with measuring and evaluating its outputs directly? If it can accurately file taxes better than us does it matter if it does it in a human manner?

Birds and planes both fly and all


If your definition of AGI is filing taxes, then it's fine.

Once we step into any other problem, then you need to measure that other problem as well. Lots of problems are concerned with how an intelligent being could fail. Our society is built on lots of those assumptions.


For _investment_ purposes the definition of AGI is very simple. It is: "to what extent can it replace human workers?".

From this perspective, "100% AGI" is achieved when AI can do any job that happens primarily on a computer. This can be extended to humanoid robots in the obvious way.


That's not what AGI used to mean a year or two ago. That's a corruption of the term, and using that definition of AGI is the mark of a con artist, in my experience.

I believe the classical definition is, "It can do any thinking task a human could do", but tasks with economic value (i.e. jobs) are the subset of that which would justify trillions of dollars of investment.

I don't see how that changes anything.

Failing like a human would is not a cute add-on. It's a fundamental requirement for creating AIs that can replace humans.


Industrial machines don't fail like humans yet they replaced human workers. Cars don't fail like horses yet they replaced them. ATMs don't fail like bank tellers... Why is this such a big requirement?

Microwaves didn't replace ovens. The Segway didn't replaced bikes. 3D movies didn't replace IMAX. I can go on and on...

Some things fail, or fail to meet their initial overblown expectations.

The microwave oven was indeed a commercial success. And that's fine, but it sucks at being an oven. Everyone knows it.

Now, this post is more about the scientific part of it, not the commercial one.

What makes an oven better than a microwave oven? Why is pizza from an oven delicious and microwave pizza sucks?

Maybe there's a reason, some Maillard reaction that requires hot air convection and can't be replicated by shaking up water molecules.

We are talking about those kinds of things. What makes it tick, how does it work, etc. Not if it makes money or not.

Damn, the thing doesn't even make money yet. Why talk about a plus that the technology still doesn't have?


The thread we're in was arguing that the requirement to be AGI is to fail the exact same way humans do. I pointed out by showing these examples that failing the exact same way is not a requirement for a new technology to replace people or other technology. You're reading too much into what I said and putting words in my mouth.

What makes it tick is probably a more interesting question to me than to the AI skeptics. But they can't stop declaring a special quality (consciousness, awareness, qualia, reasoning, intelligence) that AI by their definition cannot ever have and that this quality is immeasurable, unquantifable, undefinable... This is literally a thought stopper semantic deadend that I feel the need to argue against.

Finally, it doesn't make money the same way Amazon or Uber didn't make money for a looong time, by making lots of money, reinvesting it and not caring about profit margins for a company in its growth stage. Will we seriously go through this for every startup? It's already at $10-20b a year at least as an industry and that will keep growing.


AGI does not currently exist. We're trying to think what we want from it. Like a perfect microwave oven. If a company says they're going to make a perfect microwave oven, I want the crusty dough and delicious gratin cheese effect on my cooked focaccia-inspired meals.

What exists is LLMs, transformers, etc. Those are the microwave oven, that results in rubbery cheese and cardboard dough.

It seems that you are willing to cut some slack to the terrible microwave pizza. I am not.

You complained about immensurable qualities, like qualia. However, I gave you a very simple measurable quality: failing like a decent human would instead of producing jibberish hallucinations. I also explained in other comments on this thread why that measurable quality is important (it plays with existing expectations, just like existing expectations about a good pizza).

While I do care about those more intangible characteristics (consciousness, reasoning, etc), I decided to concede and exclude them from this conversation from the get-go. It was you that brough them back in, from who-knows-where.

Anyway. It seems that I've addressed your points fairly. You had to reach for other skeptic-related narratives in order to keep the conversation going, and by that point, you missed what I was trying to say.


> This next step I described work even with the most skeptic human interrogator possible.

To be a valid test, it still has to be passed by ~every adult human. The harder you make the test (in any direction), the more it fails on this important axis.


You are wrong. Please read the Turing paper:

https://courses.cs.umbc.edu/471/papers/turing.pdf

> A number of interrogators could be used, and statistics compiled to show how often the right identification was given

Turing determines that we need enough competent-interrogator passes just to estabilish a statistical certainty, not ~everyone. I tend to agree with him on this.


Please reread that section. You'll discover it has nothing to do with whether humans can pass the test.

If you can find a part of the paper in which Turing really does claim that it is unnecessary for most adult humans to be able to pass the test, by all means quote it. But this would be a surprising thing for him to claim, because it would undermine the entire foundation of his Imitation Game.


Do you understand how using statistics to determine degrees of certainty works? That is a must-have to understand academic work.

I think that if you did, you wouldn't be answering like you did.

https://en.wikipedia.org/wiki/P-value


Your quote does not back up your claim.

My original claim was that the Turing test needs to be passable by ~every adult human. You counterclaimed that Turing himself didn't think so, and provided that quote from the IG paper as evidence. But that quote is in a section about testing digital computers, not humans. Thus it is unconnected to your counterclaim.

I don't know how much simpler I can make it.

Find a quote that actually backs up your claim, or accept that you've learned something about the paper you told me to read.


He also never says that ~every adult human should pass, ever.

He never denied your claim, so you concluded you must be right. A most curious way of thinking.


> We know what we need right now, the next step. That step is a machine that, when it fails, it fails in a human way.

I don't know if machines that become insecure and lash out are a good idea.


The issue is if they lash out in some incomprehensible way, or lash out as a alien superingelligence. If they lash out as a human, that's fine.

Depends on how much power the human has.

The super-AI is going to have power. Deployed everywhere, used by millions, etc.

You have two choices:

- It can potentially lash out in an alien-like way.

- It can potentially lash out in a human-like way.

Do you understand why this has no effect on the argument whatsoever? You are just introducing an irrelevant observation. I want the AI to behave like human always, no exceptions.

"What if it's a bad human"

Jesus. If people make an evil AI, then it doesn't matter anyway how it behaves, it's just bad even before we get to the discussion about how it fails. Even when it accomplishes tasks succesfully, it's bad.


> Do you understand why this has no effect on the argument whatsoever? You are just introducing an irrelevant observation. I want the AI to behave like human always, no exceptions.

Do you like how humans behave? Also, how DO humans behave? What kind of childhood should we give the AI? Daddy issues? Abused as a child? Neglected by a drug addicted mother? Ruthlessly bullied in school?


We're discussing behavior in a context of a test (in the lines of the imitation game as defined by Alan Turing).

It's not a psychology exercise, my dude.


Of course it is. You seem adamant you want them to behave in a human way. Humans have behavioural patterns that are influenced by their childhoods, and sometimes those are particularly messy.

So... you either wish to replicate that or you don't.


"behave in a human way" is a vague reference to a more specific, non-psychological idea that I presented earlier.

I just explained that to you. Either we discuss this in terms of the imitation game thought experiment, or we don't.


Why are human failure modes so special?

Because we have 300 thousand years of collective experience on dealing with humans.

Ironically, one of the ways that humans are worse than AI, is that any given human learns from an even smaller fraction of that collective experience than AI already does.

I don't understand your point. How does that observation help in setting up a test or definition?

That's because it's not trying to do so. The observation is that humans are broadly unable to prepare for the failure modes of other humans, even when those failure modes have been studied and the results of those studies widely published. This means that while the failure modes of humans are indeed different from the failure modes of LLMs (and AI more broadly), these differences are not what I anticipate to be the most important next step in AI research.

Yep, humans suck in all kinds of ways. When AI gets better than us at dealing with it, then you can use that argument. That hasn't happened yet.

AI are better than most humans at dealing with human suckage, for example because unlike humans the LLMs have read all that literature about human suckage, but that's not relevant to what I was saying.

My point is: other failure of AI are more pressing. IMO the inefficiency with regard to examples, e.g. even cancelled/sold off self-driving car projects (Uber's ATG) have more miles of experience than a human professional driver can get in their entire career, and look how bad that model was.

Making a self driving car fail like a human means getting it distracted by something on the phone. Plus a bunch of other failure modes we should ignore like "drunk" and "tired".

Even if you don't fully solve the example inefficiency, merely improving it will make a big difference to performance.


>for example because unlike humans the LLMs have read all that literature about human suckage

No they haven't. If you read the cliff notes of a book, you haven't read that book. An LLM is a generalization over their entire training set, that's not what the word "reading" has ever meant.

The LLM does not "know" anything about human suckage or how to get around it, and will not use those "learnings" in it's "thinking", it will only come up if the right nodes in it's model trigger, and then it just generates tokens that match the "shape" of writing that was written with that knowledge.

A bloom filter can be used to test for presence of something in your DB, with configurable probability even (something that LLMs massively lack), but a bloom filter does not Know what is in your DB

When you fit a linear regression to a plot of free falling speed over time, you will have an equation for acceleration of gravity, but you don't "Know" gravity, and that equation will not allow you to recover actual generalizable models of gravity. That limited model will still get you most of the way to the moon though.

Generally the next claim is "same as human brains" but no, that has not been proven and is not a given. "Neural Networks" are named that way as marketing. They've never been an accurate simulation of actual animal neurons and a single animal neuron has far more robust capabilities than even many "Neurons" interconnected. Consider how every animal neuron in an animal brain intrinsically swims in a bath of hormone gradients that can provide positional 3d information, and how the structure of those real neurons is at least partially structured based on a thousand generations of evolution, and involves highly conserved sub-structures. Brains do not learn like neural nets do.


You appear to be arguing against a totem, not against what I actually wrote.

> AI are better than most humans at dealing with human suckage

That is a valid opinion, but subjective. If I say that they're not better, we're going to be exchanging anecdotes and getting nowhere.

Hence, the need for a less subjective way of evaluating AI's abilities.

> Making a self driving car fail like a human... "drunk" and "tired"

You don't understand.

It's not about making them present the same failure rate or personality defects as a human. Of course we want self-driving cars to make less errors and be better than us.

However, when they fail, we want them to fail like a good sane human would instead of hallucinating jibberish that could catch other humans off guard.

Simplifying, It's better to have something that works 95% of the time, and hallucinates in predictable ways 5% of the time than having something that works 99% of the time but hallucinates catastrophically in that 1%.

Stick to the more objective side of the discussion, not this anecdotal subjective talk that leads nowhere.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: