Absolute definitions are weak. They won't settle anything.
We know what we need right now, the next step. That step is a machine that, when it fails, it fails in a human way.
Humans also make mistakes, and hallucinate. But we do it as humans. When a human fails, you think "damn, that's a mistake perhaps me or my friend could have done".
LLMs on the other hand, fail in a weird way. When they hallucinate, they demonstrate how non-human they are.
It has nothing to do with some special kind of interrogator. We must assume the best human interrogator possible. This next step I described work even with the most skeptic human interrogator possible. It also synergizes with the idea of alignment in ways other tests don't.
When that step is reached, humans will or will not figure out another characteristic that makes it evident that "subject X" is a machine and not a human, and a way to test it.
Moving the goalpost is the only way forward. Not all goalpost moves are valid, but the valid next move is a goalpost move. It's kind of obvious.
There's no reason to the idea "superior intelligence". Nobody can say what that means, except by assuming that animal intelligence is the same category as the kind we want and differs from human intelligence in degree rather than qualitatively, and then extrapolating forward from this idea of measuring intelligence on the intelligence meter that we don't have one of.
Besides which we already defined "artificial intelligence" to mean non-intelligence: are we now going to attain "artificial general intelligence" by the same process? Should we add another letter to the acronym, like move on to "genuine artificial general intelligence"?
Is there really no agreement to what intelligence refers to? I've seen it defined as the ability to reach a goal, which was clear to me. Eg. a chess AI with 1500 ELO is more intelligent than one at 1000
That's capability, intelligence can also be how quickly it learned to get to that capability.
Consider the difference in intelligence between a kid who skipped five years of school vs one who was held back a year: if both got the same grade in the end, the one who skipped five years was smarter.
> Looking at it solely on rate of learning has LLMs way smarter than humans already which doesn't seem right to say
Sure, but "rate" also has two meanings, both useful, but importantly different: per unit of wall-clock time, and per example.
Transistors are just so much faster than synapses, that computers can (somewhat) compensate for being absolutely terrible by the latter meaning — at least, in cases where there's enough examples for them to learn from.
In cases where the supply of examples is too small (and cannot be enhanced with synthetic data, simulations and so on), state of the art AI models still suck. In cases where there is sufficient data, for example self-play in games of chess and go, the AI can be super-human by a substantial margin.
What's wrong with measuring and evaluating its outputs directly? If it can accurately file taxes better than us does it matter if it does it in a human manner?
If your definition of AGI is filing taxes, then it's fine.
Once we step into any other problem, then you need to measure that other problem as well. Lots of problems are concerned with how an intelligent being could fail. Our society is built on lots of those assumptions.
For _investment_ purposes the definition of AGI is very simple. It is: "to what extent can it replace human workers?".
From this perspective, "100% AGI" is achieved when AI can do any job that happens primarily on a computer. This can be extended to humanoid robots in the obvious way.
That's not what AGI used to mean a year or two ago. That's a corruption of the term, and using that definition of AGI is the mark of a con artist, in my experience.
I believe the classical definition is, "It can do any thinking task a human could do", but tasks with economic value (i.e. jobs) are the subset of that which would justify trillions of dollars of investment.
Industrial machines don't fail like humans yet they replaced human workers. Cars don't fail like horses yet they replaced them. ATMs don't fail like bank tellers... Why is this such a big requirement?
The thread we're in was arguing that the requirement to be AGI is to fail the exact same way humans do. I pointed out by showing these examples that failing the exact same way is not a requirement for a new technology to replace people or other technology. You're reading too much into what I said and putting words in my mouth.
What makes it tick is probably a more interesting question to me than to the AI skeptics. But they can't stop declaring a special quality (consciousness, awareness, qualia, reasoning, intelligence) that AI by their definition cannot ever have and that this quality is immeasurable, unquantifable, undefinable... This is literally a thought stopper semantic deadend that I feel the need to argue against.
Finally, it doesn't make money the same way Amazon or Uber didn't make money for a looong time, by making lots of money, reinvesting it and not caring about profit margins for a company in its growth stage. Will we seriously go through this for every startup? It's already at $10-20b a year at least as an industry and that will keep growing.
AGI does not currently exist. We're trying to think what we want from it. Like a perfect microwave oven. If a company says they're going to make a perfect microwave oven, I want the crusty dough and delicious gratin cheese effect on my cooked focaccia-inspired meals.
What exists is LLMs, transformers, etc. Those are the microwave oven, that results in rubbery cheese and cardboard dough.
It seems that you are willing to cut some slack to the terrible microwave pizza. I am not.
You complained about immensurable qualities, like qualia. However, I gave you a very simple measurable quality: failing like a decent human would instead of producing jibberish hallucinations. I also explained in other comments on this thread why that measurable quality is important (it plays with existing expectations, just like existing expectations about a good pizza).
While I do care about those more intangible characteristics (consciousness, reasoning, etc), I decided to concede and exclude them from this conversation from the get-go. It was you that brough them back in, from who-knows-where.
Anyway. It seems that I've addressed your points fairly. You had to reach for other skeptic-related narratives in order to keep the conversation going, and by that point, you missed what I was trying to say.
> This next step I described work even with the most skeptic human interrogator possible.
To be a valid test, it still has to be passed by ~every adult human. The harder you make the test (in any direction), the more it fails on this important axis.
> A number of interrogators could be used, and statistics compiled to show how
often the right identification was given
Turing determines that we need enough competent-interrogator passes just to estabilish a statistical certainty, not ~everyone. I tend to agree with him on this.
Please reread that section. You'll discover it has nothing to do with whether humans can pass the test.
If you can find a part of the paper in which Turing really does claim that it is unnecessary for most adult humans to be able to pass the test, by all means quote it. But this would be a surprising thing for him to claim, because it would undermine the entire foundation of his Imitation Game.
My original claim was that the Turing test needs to be passable by ~every adult human. You counterclaimed that Turing himself didn't think so, and provided that quote from the IG paper as evidence. But that quote is in a section about testing digital computers, not humans. Thus it is unconnected to your counterclaim.
I don't know how much simpler I can make it.
Find a quote that actually backs up your claim, or accept that you've learned something about the paper you told me to read.
The super-AI is going to have power. Deployed everywhere, used by millions, etc.
You have two choices:
- It can potentially lash out in an alien-like way.
- It can potentially lash out in a human-like way.
Do you understand why this has no effect on the argument whatsoever? You are just introducing an irrelevant observation. I want the AI to behave like human always, no exceptions.
"What if it's a bad human"
Jesus. If people make an evil AI, then it doesn't matter anyway how it behaves, it's just bad even before we get to the discussion about how it fails. Even when it accomplishes tasks succesfully, it's bad.
> Do you understand why this has no effect on the argument whatsoever? You are just introducing an irrelevant observation. I want the AI to behave like human always, no exceptions.
Do you like how humans behave? Also, how DO humans behave? What kind of childhood should we give the AI? Daddy issues? Abused as a child? Neglected by a drug addicted mother? Ruthlessly bullied in school?
Of course it is. You seem adamant you want them to behave in a human way. Humans have behavioural patterns that are influenced by their childhoods, and sometimes those are particularly messy.
So... you either wish to replicate that or you don't.
Ironically, one of the ways that humans are worse than AI, is that any given human learns from an even smaller fraction of that collective experience than AI already does.
That's because it's not trying to do so. The observation is that humans are broadly unable to prepare for the failure modes of other humans, even when those failure modes have been studied and the results of those studies widely published. This means that while the failure modes of humans are indeed different from the failure modes of LLMs (and AI more broadly), these differences are not what I anticipate to be the most important next step in AI research.
AI are better than most humans at dealing with human suckage, for example because unlike humans the LLMs have read all that literature about human suckage, but that's not relevant to what I was saying.
My point is: other failure of AI are more pressing. IMO the inefficiency with regard to examples, e.g. even cancelled/sold off self-driving car projects (Uber's ATG) have more miles of experience than a human professional driver can get in their entire career, and look how bad that model was.
Making a self driving car fail like a human means getting it distracted by something on the phone. Plus a bunch of other failure modes we should ignore like "drunk" and "tired".
Even if you don't fully solve the example inefficiency, merely improving it will make a big difference to performance.
>for example because unlike humans the LLMs have read all that literature about human suckage
No they haven't. If you read the cliff notes of a book, you haven't read that book. An LLM is a generalization over their entire training set, that's not what the word "reading" has ever meant.
The LLM does not "know" anything about human suckage or how to get around it, and will not use those "learnings" in it's "thinking", it will only come up if the right nodes in it's model trigger, and then it just generates tokens that match the "shape" of writing that was written with that knowledge.
A bloom filter can be used to test for presence of something in your DB, with configurable probability even (something that LLMs massively lack), but a bloom filter does not Know what is in your DB
When you fit a linear regression to a plot of free falling speed over time, you will have an equation for acceleration of gravity, but you don't "Know" gravity, and that equation will not allow you to recover actual generalizable models of gravity. That limited model will still get you most of the way to the moon though.
Generally the next claim is "same as human brains" but no, that has not been proven and is not a given. "Neural Networks" are named that way as marketing. They've never been an accurate simulation of actual animal neurons and a single animal neuron has far more robust capabilities than even many "Neurons" interconnected. Consider how every animal neuron in an animal brain intrinsically swims in a bath of hormone gradients that can provide positional 3d information, and how the structure of those real neurons is at least partially structured based on a thousand generations of evolution, and involves highly conserved sub-structures. Brains do not learn like neural nets do.
> AI are better than most humans at dealing with human suckage
That is a valid opinion, but subjective. If I say that they're not better, we're going to be exchanging anecdotes and getting nowhere.
Hence, the need for a less subjective way of evaluating AI's abilities.
> Making a self driving car fail like a human... "drunk" and "tired"
You don't understand.
It's not about making them present the same failure rate or personality defects as a human. Of course we want self-driving cars to make less errors and be better than us.
However, when they fail, we want them to fail like a good sane human would instead of hallucinating jibberish that could catch other humans off guard.
Simplifying, It's better to have something that works 95% of the time, and hallucinates in predictable ways 5% of the time than having something that works 99% of the time but hallucinates catastrophically in that 1%.
Stick to the more objective side of the discussion, not this anecdotal subjective talk that leads nowhere.
We know what we need right now, the next step. That step is a machine that, when it fails, it fails in a human way.
Humans also make mistakes, and hallucinate. But we do it as humans. When a human fails, you think "damn, that's a mistake perhaps me or my friend could have done".
LLMs on the other hand, fail in a weird way. When they hallucinate, they demonstrate how non-human they are.
It has nothing to do with some special kind of interrogator. We must assume the best human interrogator possible. This next step I described work even with the most skeptic human interrogator possible. It also synergizes with the idea of alignment in ways other tests don't.
When that step is reached, humans will or will not figure out another characteristic that makes it evident that "subject X" is a machine and not a human, and a way to test it.
Moving the goalpost is the only way forward. Not all goalpost moves are valid, but the valid next move is a goalpost move. It's kind of obvious.