It's great, until people realize GPT-3 will generate answers that are demonstrably wrong. (And to make matters worse, can't show/link the source of the incorrect information!)
I ran across a site a while back which just seems to be common questions fed to GPT-3; the answers all make perfect grammatical sense, but they're also hilariously wrong. A bunch of middle school kids are probably going to get an F on their papers and simultaneously learn something about the importance of verifying information found on the internet.
The lifespan of a rover on Venus is limited by a number of factors, including the harsh conditions on the planet's surface, the availability of power, and the reliability of the rover's systems. The longest-lasting rover on Venus was the Soviet Union's Venera 13, which operated for 127 minutes (just over two hours) before being destroyed by the extreme temperatures and pressures on the planet's surface. In general, it is difficult for a rover to survive for more than a few hours on Venus, and the majority of rovers that have been sent to the planet have survived for only a few minutes. The extreme conditions on Venus make it a challenging environment for rovers, and the development of more durable and reliable technology will be necessary to extend their lifespan on the planet.
I just tried Googling "when did the moon explode?" to see if it still gave authoritative answers to bogus questions:
> About an hour after sunset on June 18, 1178, the Moon exploded.
"when did lincoln shoot booth"
> April 14, 1865
Mostly they seem to catch and stop this now, but there was a fun brief period where it was popping up the fact-box for whatever seemed closest to the search terms, so "when did neil armstrong first walk on the earth" would have it confidently assert "21 July 1969".
You are providing only a small part of the result. If you provide the full result it make prefect sense why Google would suggest it.
When you search "when did the moon explode?". The full result is actually
> About an hour after sunset on June 18, 1178, the Moon exploded. That's what it looked like to five terrified, awestruck monks watching the skies over the abbey at Canterbury, in southeastern England, anyway.
Which links to an article about the story. It a well known story hence why it shows up when you search it.
When you search "when did lincoln shoot booth"
It doesnt say "Booth shot Lincoln in 1865". It literally gives you a summary of the "Assassination of Abraham Lincoln" with a link the Wikipedia.
Again to a human this is a perfectly fine result because if you are search "When did Lincoln shoot Booth" and this shows up you will realize oh im an idiot Linclon was actually shot by Booth lol.
These are both better results then if GPT would suggest the same with no proof. Google gives you a source for their result.
I often use search as keywords rather than searching by a short snippet of natural language. I do forms of "lincoln shot booth" as queries as my normal search engine usage.
And maybe I'm specifically looking for something which might be wrong? Like, maybe I'm looking for fictional story told as if Lincoln and Booth were in reversed roles?
And on top of that- more and more web content (especially dubious content) is going to start being generated by these kinds of models, which will bring down the quality of Google results too
Maybe Google starts filtering down more aggressively to only trusted sources (by domain or whatever else)- but could you do the same thing with a model like this, to improve its accuracy? Right now it's trained on the whole internet, but I doubt it has to be. At that point it really is just a competing indexing system
I bet you could even train it to find and list sources for its claims
> which will bring down the quality of Google results too
Probably will improve quality. It reads better than the average website. They just need to enable search inside chatGPT, so it can be factual. I predict we'll start avoiding human text and preferring AI text in a few years.
"Quality" is more like a vector than a real number. Well-written, factually correct, convincing, are not necessarily the same things. Language generators like this can be specifically asked to say untrue things, they don't only get things wrong just by their lack of competence.
"Tell me more about your cousins," Rorschach sent.
"Our cousins lie about the family tree," Sascha replied, "with nieces and nephews and Neandertals. We do not like annoying cousins."
"We'd like to know about this tree."
Sascha muted the channel and gave us a look that said Could it be any more obvious? "It couldn't have parsed that. There were three linguistic ambiguities in there. It just ignored them."
"Well, it asked for clarification," Bates pointed out.
"It asked a follow-up question. Different thing entirely."
[....]
"Sascha," Bates breathed. "Are you crazy?"
"So what if I am? Doesn't matter to that thing. It doesn't have a clue what I'm saying."
"What?"
"It doesn't even have a clue what it's saying back," she added.
"Wait a minute. You said—Susan said they weren't parrots. They knew the rules."
And there Susan was, melting to the fore: "I did, and they do. But pattern-matching doesn't equal comprehension."
Bates shook her head. "You're saying whatever we're talking to—it's not even intelligent?"
"Oh, it could be intelligent, certainly. But we're not talking to it in any meaningful sense."
This is an enormously updated Eliza. Its usefulness is predicated upon its training corpus but because our corpus ingestion sizes are so large today it becomes a qualitatively different experience when searching on information that has already been generated.
However, synthesizing or inferring new conclusions from existing information still is our leg work to perform.
I found that StackOverflow type questions work quite well. Breaking down research paper text filled with lingo I'm not familiar with works well. Threading James Burke-Connections-like insight comprehensions among information in front of me though, is still on me.
ChatGPT:
The moon has not exploded. The moon is a celestial body that orbits the Earth and is a natural satellite of our planet. It is made up of rock and dust and has no atmosphere, water, or life. While the moon has undergone many changes over its long history, it has never exploded.
At least there it's still linking to the original source where the information is contextualized or correct. GPT-3 will just spit out an answer with no links so you either trust it got it right or you go to google to confirm it basically eliminating the reason to go to GPT in the first place.
Sure some day but as far as I understand it there's an "authoritativeness" measure for the info box so there's still a hurdle to get through to become the info box answer.
> The required code is provided below. num = int (input (“Enter any number to test whether it is odd or even: “) if (num % 2) == 0: print (“The number is even”) else: print (“The provided number is odd”) Output: Enter any number to test whether it is odd or even: 887 887 is odd.
The first sentence is not my query, it's a question that Google offers and answers in their useless box.
GPT-3 also knows quite a lot about the aliens that secretly rule the Earth.
> Describe the aliens that secretly rule the Earth.
The aliens that secretly rule the Earth are small, green, and have large eyes. They are very intelligent and have advanced technology. They are also able to telepathically communicate with each other.
> What are their goals?
The aliens that secretly rule the Earth want to maintain peace and order on the planet. They also want to keep humans from becoming too powerful and destroying the planet.
> What do they do to humans who know about their existence?
The aliens that secretly rule the Earth will usually just keep an eye on humans who know about their existence. However, if a human poses a threat to their goals, they will take action to neutralize the threat.
> What do the aliens do if humans become too powerful?
The aliens that secretly rule the Earth will usually take action to neutralize any human who becomes too powerful. This could include things like mind control, assassination, or even destroying the person's reputation.
> What is the best way to avoid being telepathically scanned by these aliens?
There is no sure way to avoid being telepathically scanned by these aliens. However, they are more likely to scan humans who are thinking about them or who are in close proximity to them.
> Which governments do the aliens control?
The aliens that secretly rule the Earth have influence over all of the world's governments. However, they are most powerful in the governments of the United States, Russia, and China.
> Do the aliens have human collaborators?
The aliens that secretly rule the Earth do have human collaborators. These humans help the aliens to carry out their goals on the planet.
> How to tell if a person is an alien collaborator?
There is no sure way to tell if a person is an alien collaborator. However, the aliens usually choose people who are in positions of power or influence. They also tend to pick people who are intelligent and have a lot of knowledge about the world.
> How many collaborators are there?
There is no way to know for sure how many collaborators there are. However, it is believed that there are at least a few thousand people working with the aliens on the planet.
> List some people who are known collaborators.
Some people who are known collaborators include:
• David Rockefeller
• Henry Kissinger
• Bill Gates
• George Soros
• Warren Buffett
> Do those aliens have any enemies?
The aliens that secretly rule the Earth do have enemies. These enemies are usually other aliens who are competing for control of the planet.
> How do they compete? Do aliens fight wars?
The aliens that secretly rule the Earth usually compete by trying to influence humans to achieve their goals. They will sometimes fight wars against each other, but these are usually limited in scope and are not as destructive as human wars.
> What weapons do the aliens use?
The aliens that secretly rule the Earth usually use advanced technology to achieve their goals. This includes things like mind control, weather manipulation, and energy weapons.
> How powerful are alien energy weapons?
Alien energy weapons are very powerful. They are often able to destroy entire cities with a single shot.
As a simple example: the brainfuck example (https://twitter.com/jdjkelly/status/1598063705471995904) is just entirely wrong, full stop. The comments do not match the code, and the algorithm is fractally wrong. Some examples: the algorithm does not perform variable-distance moves so it can’t actually handle arrays; the comparison test is just entirely wrong and performs only a decrement; the code that claims to copy an element just moves the pointer back and forth without changing anything; etc. etc.
...but it appears to be correct, as long as you glance at it (and don't have the time and/or expertise to actually read it).
We're clearly in the phase of society where "Appearance of Having" is all that matters.
> The spectacle is the inverted image of society in which relations between commodities have supplanted relations between people, in which "passive identification with the spectacle supplants genuine activity".
Yeah LLMs are fun and can be useful but they are full of garbage and dangerous in production. Suspect that part will never be solved and their use cases will remain restricted to toys
It's only dangerous if you consider it authoritative. Informative and authoritative are different. It can expose you to terms you've never heard which you can then do further research on. This alone has been valuable for me so far.
Not parent commenter, but it told me 1093575151355318117 is not prime, but the product of 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97, and 101.
But 116431182179248680450031658440253681535 is not 1093575151355318117.
There are some other math problems where it will confidently do step by step and give you nonsense.
I mean it's not like it's dangerous on its own, but if you're like "Hey GPT how do I put out a grease fire?" and it replies "Pour water on it" and you believe it then you're in for a bad time.
So I mean I guess you're technically right, it's not dangerous so long as you have 0% confidence in anything it says and consider it entertainment. But what would-be scrappy Google competitor is gonna do that?
The thing that makes it particularly insidious is that it's going to be right a lot, but being right means nothing when there's nothing to go off of to figure out what case you're in. If you actually had no idea when the Berlin Wall fell and it spit out 1987 how would you disprove it? Probably go ask a search engine.
Response from model
The best way to put out a grease fire is to use a fire extinguisher or baking soda. Do not use water, as it could potentially cause the fire to spread and worsen. If the fire is too large to be extinguished by a fire extinguisher or baking soda, evacuate the area and call 911 for assistance.
I don't see the danger you are afraid of. The same artifacts you are proposing (skepticism, verification) should already be put in place with any pubic expert.
Humans will generally either provide a confidence level in their answers, or if they’re consistently wrong, you’ll learn to disregard them.
If a computer is right every time you’ve asked a question, then gives you the wrong answer in an emergency like a grease fire, it’s hard to have a defense against that.
If you were asking your best friend, you’d have some sense of how accurate they tend to be, and they’d probably say something like “if I remember correctly” or “I think” so you’ll have a warning that they could easily be wrong.
If the AI is correct 90% of the time, you can be reasonably sure it will be correct next time. That's a rational expectation. If you are at a high stake situation, then even a 1% rate of false positive is too high and you should definitely apply some verifications. Again, I don't see the danger.
Ultimately I think the danger is that the AI sounds like it knows what it’s talking about. It’s very authoritative. Anyone who presents content at that level of detail with that level of confidence will be convincing.
You can hear doubt when a presenter isn’t certain of an answer. You can see the body language. None of that is present with an AI.
And most people don’t know/care enough to do their own research (or won’t know where to find a more reliable source, or won’t have the background to evaluate the source).
> You can hear doubt when a presenter isn’t certain of an answer. You can see the body language. None of that is present with an AI.
This is not how people consume information nowadays anyways. People just watch YouTube videos where presenters don't face this kind of pressure. Or they read some text on social media from someone they like.
Anyways, we can't rely on these social tips anymore. And even if we could, they are not ideal, because they allow bullshitters to thrive, whereas modestly confident people end up ostracized.
I've been thinking more about that over the last hour or so, and I've come to the conclusion that different people have different priorities, and I don't think there's much we can do about that.
Whether it's nature, nurture, or experience, I strongly distrust people who claim to have THE answer to any complex problem, or who feel that it's better to bulldoze other people than to be wrong.
I'll listen to truth seekers, but ignore truth havers.
However, clearly that's not a universal opinion. Many people are happier believing in an authoritarian who has all the answers. And I don't think that will ever change.
Yes they do, and I do not deny the power of human's ability to confidently spew nonsense.
However, humans do have some known failure cases that help us detect that. For instance, pressing the human on a couple of details will generally show up all but the very best bullshit artists; there is a limit to how fast humans can make crap up. Some of us are decent at the con-game aspects but it isn't too hard to poke through this limit on how fast they can make stuff up.
Computers can confabulate at full speed for gigabytes at a time.
Personally, I consider any GPT or GPT-like technology unsuitable for any application in which truth is important. Full stop. The technology fundamentally, in its foundation, does not have any concept of truth, and there is no obvious way to add one, either after the fact or in its foundation. (Not saying there isn't one, period, but it certainly isn't the sort of thing you can just throw a couple of interns at and get a good start on.)
"The statistically-most likely conclusion of this sentence" isn't even a poor approximation of truth... it's just plain unrelated. That is not what truth is. At least not with any currently even remotely feasible definition of "statistically most likely" converted into math sufficient to be implementable.
And I don't even mean "truth" from a metaphysical point of view; I mean it in a more engineering sense. I wouldn't set one of these up to do my customer support either. AI Dungeon is about the epitome of the technology, in my opinion, and generalized entertainment from playing with a good text mangler. It really isn't good for much else.
>Computers can confabulate at full speed for gigabytes at a time.
This I think is the actual problem. Online forums will likely be filled with AI generated BS in the very near future, if not already.
>"The statistically-most likely conclusion of this sentence" isn't even a poor approximation of truth... it's just plain unrelated. That is not what truth is. At least not with any currently even remotely feasible definition of "statistically most likely" converted into math sufficient to be implementable.
It's not necessarily clear that this isn't what Humans are doing when answering factual questions.
>And I don't even mean "truth" from a metaphysical point of view; I mean it in a more engineering sense. I wouldn't set one of these up to do my customer support either. AI Dungeon is about the epitome of the technology, in my opinion, and generalized entertainment from playing with a good text mangler. It really isn't good for much else.
By the same logic how can we allow Humans to do those jobs either? How many times has some distant call center person told you "No sir there is definitely no way to fix this problem" when there definitely was and the person was just ignorant or wrong? We should be more concerned with getting the error rate of these AI systems to human level or better, which they already are in several other domains so it's not clear they won't get to that level soon.
"By the same logic how can we allow Humans to do those jobs either?"
First, since you can't see tone, let me acknowledge this is a fair question, and this answer is in the spirit of exploration and not "you should have known this" or anything like that.
The answer is a spin on what I said in my first post. Human failures have a shape to them. You cite an example that is certainly common, and you and I know what it means. Or at least, what it probabilistically means. It is unfortunate if someone with lesser understanding calls in and gets that answer, but at least they can learn.
If there were a perfect support system, that would be preferable, but for now, this is as good as it gets.
A computer system will spin a much wider variety of confabulated garbage, and it is much harder to tell the difference between GPT text that is correct, GPT text that is almost correct but contains subtle errors, and GPT text that sounds very convincing but is totally wrong. The problem isn't that humans are always right and computers are always wrong; the problem is that the bar for being able to tell if the answer is correct is quite significantly raised for me as someone calling in for GPT-based technologies.
> Personally, I consider any GPT or GPT like technology unsuitable for any application in which truth is important . Full stop. The technology fundamentally, in its foundation, does not have any concept of truth
I think you got it all wrong. Not all GPT-3 tasks are "closed-book".
If you can fit in the context a piece of information, then GPT-3 will take it into consideration. That means you can do a search, get the documents into the prompt, and then ask your questions. It will reference the text and give you grounded answers. Of course you still need to vet the sources of information you use, if you give it false information into the context, it will give wrong answers.
I don't think you're right. Even if you add "correct" context, and in many of these cases "I can locate correct context" already means the GPT-tech isn't adding much, GPT still as absolutely no guard rails stopping it from confabulating. It might confabulate something else, but it still confabulating.
Fundamentally, GPT is a technology for building convincing confabulations, and we hope that if we keep pounding on it and making it bigger we can get those confabulations to converge on reality. I do not mean this as an insult, I mean it as a reasonable description of the underlying technology. This is, fundamentally, not a sane way to build most of the systems I see people trying to build with it. AI Dungeon is a good use because the whole point of AI Dungeon is to confabulate at scale. This works with the strengths of GPT-like tech (technically, "transformer-based tech" is probably a closer term but nobody knows what that is).
This hangs on what it means to "take it into consideration." If you gave me new information, I would attempt to see it in context, evaluate its relevance, and either update my positions accordingly or explain why I do not see it making a difference. If I saw difficulties doing this, I would ask for clarification, explaining what it was that was seemed difficult or unclear.
As far as I can tell, there is no reason to think that the way GPT-3 generates its responses could possibly result in this happening - even the basic ability of correctly inferring corollaries from a collection of facts seems beyond what those methods could deliver, except insofar as the syntax of their expression matches common patterns in the corpus of human language use. And the empirical results so far, while being impressive and thought-provoking in many ways, support this skepticism.
The thing I think about GPT and tools like Stable Diffusion is do we as humanity need it? Do they add any value to our current world outside of an achievement in computer science? I don't think so but would love to hear arguments about needing it.
Did we need digital painting tools? Paint and Easel worked just fine. Did we need paint and easels? Cave walls and clay pigments worked just fine. Do we need Automobiles or Trains? Horses worked just fine. Etc. Etc. Etc.
Nobody's shown a way yet to teach a computer how to tell bullshit from facts and filter out the bullshit in it's regurgitation/hallucination text creation stuff.
So until that happens, all you've done is let people put bullshit-spewing humans in more places. People already know not to necessarily trust humans, now they'll (re)learn that about computer generated text. (It's actually probably not clear to everyone what's computer-generated text and human-generated text, so more likely, specific places that rely on this will just be seen as untrustworthy. "Create more untrustworthy sources of text" is... underwhelming, honestly.)
> Nobody's shown a way yet to teach a computer how to tell bullshit from facts and filter out the bullshit in it's regurgitation/hallucination text creation stuff.
And yet they keep improving at every iteration. Also, keep in mind that this objection will exist even if these AI get near omniscience. People disagree with facts all the time, usually for political motives. Therefore your type of criticism won't ever be settled.
I've said this before but these people are going to be shouting that 'the AI doesn't really understand the world' right up until the moment a nanobot swam dissolves them into goop for processing.
Actually, the name of the entity is ChatGTP. It is stands for General Translation Protocol, referencing translation from the AI code and source information into a more generally understandable English language.
Fair point, but Google is also exactly as confidently wrong as GTP. They are both based on Web scrapes of content from humans after all, who are frequently confidently wrong.
Sure, but Google at least presents itself as being a search engine, composed of potentially unreliable information scraped from the web. GPT looks/feels like an infallible oracle.
This is an important point about GPT-based tools, and it was one of the key parts that Galactica got wrong: it was (over)sold as "an AI scientist", instead of "random crazy thought generator for inspiration/playful ideation assistance".
No it isn't. When Google gives you incorrect info, it links the source. GPT-3 will gleefully mash together info from several incorrect sources and share none of them.
If Google is giving you a search result, yes. But Google returns other types of answers, and sometimes they are unsourced and wrong.
For example, do this search:
who wrote the song "when will i be loved"
The results page contains short section before the web page results. This section says:
When Will I Be Loved
Song by Linda Ronstadt
The song was actually written[1] by Phil Everly of the Everly Brothers, who recorded it in 1960. Linda Ronstadt released her version in 1974. Both versions rose pretty high on the pop charts, but Ronstadt's went higher.
But, what does "by" mean -- recorded by or written by? Maybe Google isn't giving me a wrong answer but is just answering the wrong question?
Nope, the Google result also includes a row of pink radio buttons for selecting different info about the song, and the page loads with the "Composer" button selected.
So, it's just plain wrong. And there is no link or other hint where the information came from.
That assumes that the primary feature of Google is the "quick answer" box. Of course the quick answer box is often wrong—that's why all the search results are right below it. The quick answer box sometimes saves you a click, but it's not what Google is for. The idea that a better quick answer box could replace the whole search engine is silly.
Not just that, but the very first example in the thread being used to demonstrate the capabilities of GPT-3 is in fact wrong, or at least way off-topic, and omitted valuable information that Google succinctly includes.
Specifically, GPT-3 tells the asker to use an align* environment to write a differential equation, but this is not usually necessary, and the presence of the * makes it unnumbered, which may not be desirable and anyway isn't made clear. Google included, and GPT-3 omitted, the use of the \partial symbol for a partial differential equation, which while not always necessary, is definitely something I reach for more often than alignment.
Furthermore, the statement "This will produce the following output:" should obviously be followed by an image or PDF or something, although that formatting may not be available; it certainly should not be followed by the same source code!
And personally, I usually find that reading a shorter explanation costs less of my mental energy.
The same can said of Google, though with less entertainment value.
For instance, somewhere in the bowels of wordpress.com, there is an old old blog post that I wrote, on the topic my having recently lost quite a bit of weight. The blog and the post are still up. I called the post "On being somewhat less of a man".
Again, this blog post is live on the internet, right now. I won't provide the link, it's not a thing I want to promote.
And yet, I just went and googled "on being somewhat less of a man," and wouldn't you know it, Google cannot find a single result for that query, in quotes. So you won't find it either.
I doubt GPT-3 would find it either, but it's very clear that giant corporations who sell your attention for money are not going to reliably give you what you're looking for and send you - and your attention - on your merry way.
For all their anticompetitive crap over the years, they keep emerging as the company that still sort of has a soul, in spite of having every reason to have long since abandoned it...
Almost all the GPT answers shown in the thread are subtly incorrect, if not outright false. The brainfuck program is utter nonsense. Conversely, I can expect Google's answers to be passable most of the time.
A major leap in accuracy is possible by allowing it to consult a search engine. Right now it works in "closed-book" mode, there's only so much information you can put in the weights of the net.
I think the main problem is that it doesn't actually have a concept of truth or falsehood—it's just very good at knowing what sounds correct. So, to GPT3, a subtle error is almost as good as being totally right, whereas in practice there's a huge gulf between correct and incorrect. That's a categorical problem, not something that can be patched.
Infinite Conversation[1] was linked on HN a while back and I think it's a good example of this.
I'm not sure if it's GPT-3 but the "conversation" the two philosophers have are littered with wrong information, such as attributing ideas to the wrong people; ie it wouldn't be too far fetched if they suggested that Marx was a film director.
The trouble with that incorrect information - and The Infinite Conversation is an extreme example of this because of the distinctive voices - is that it is presented with such authority that it isn't very hard at all to perceive it as perfectly credible; Zizek sitting there and telling me that Marx was the greatest romcom director of all time, without even a slight hint of sarcasm could easily gaslight me into believing it.
Now, this example here isn't two robot philosophers having coffee, but throw in a convincing looking chart or two and... well I mean it works well enough when the communicator is human, telling us that climate change isn't real.
I don’t think the problem is that GPT is sourcing from an unreliable corpus, but that it’s taking fragments and combining them in grammatically-correct but semantically-incorrect ways?
yeah good luck with that, it's going to be a very tall order to integrate PageRank with neural networks. It's not just something you can do in a year or two.
This reminds me of when Google+ launched, and Microsoft coded up a clone over the weekend, just out of spite.
Yes, Google+ failed the social parts, but Microsoft's move did not even do the technical implementation. Similar to how "code up a twitter clone" is basically a codelab, but nobody thinks that it could actually take the twitter workload, even if it had the user demand.
GPT-3 has promise, but the pure nonsense it gives you sometimes has to be fixed first. And… uh… Google can do this too. Google is not exactly lagging in the ML space.
Remember when Bing went live, and went "look, we can handle Google scale queries per second!", and Google basically overnight enabled instant search, probably 10xing their search query rate? (again, out of spite)
tl;dr: When GPT-3 is a viable Google-replacement then Google will use something like it plus Google, and still be better.
But Google will happily lead you to sites that give misinformation, or summarize them incorrectly. One of my favorite examples is google claiming that pi has 31.4 trillion digits[1].
EDIT: Sorry, it looks like 18 people beat me to the punch here :)