Lawyer cited 6 fake cases made up by ChatGPT; judge calls it “unprecedented”

latexr · on May 30, 2023

Prior discussion on the same matter from a different link: https://news.ycombinator.com/item?id=36095352

dang · on May 31, 2023

Thanks! Macroexpanded:

Lawyer cites fake cases invented by ChatGPT, judge is not amused - https://news.ycombinator.com/item?id=36097900 - May 2023 (304 comments)

A man sued Avianca Airline – his lawyer used ChatGPT - https://news.ycombinator.com/item?id=36095352 - May 2023 (127 comments)

ChatGPT-Authored Legal Filing “Replete with Citations to Non-Existent Cases" - https://news.ycombinator.com/item?id=36092509 - May 2023 (71 comments)

Presumably also related from earlier today:

Mandatory Certification Regarding Generative Artificial Intelligence - https://news.ycombinator.com/item?id=36131942 - May 2023 (35 comments)

vanattab · on May 31, 2023

Sounds like the man's real case is against his lawyer.

hluska · on May 30, 2023

For reference, here is the judge’s order. Endnote #2 alone is worth the price of admission:

https://s3.documentcloud.org/documents/23826753/judgeaskingt...

StrangeATractor · on May 30, 2023

That guy has some work to do on his resume.

josu · on May 30, 2023

He'll just outsource it to chatGPT.

mabbo · on May 30, 2023

I think the judge should not take into consideration anything about where the lawyer said they got the case information from.

When you go to court and cite previous cases, you are responsible for ensuring they are real cases. If you can't do that, what exactly is your job as a lawyer?

kevinmchugh · on May 30, 2023

If someone hacked westlaw and maliciously replaced opinions, I think it would not warrant censure of lawyers who cited westlaw. Maybe a bad day with the judge, and they'd have to amend the filings. This is malpractice in a way that relying on an accepted if technically fallible resource isn't

mabbo · on May 31, 2023

What a great plot that would be for a movie.

Big court case where it's pretty clear the plucky good guys aren't going to win it. So they hire hackers to break into westlaw and give the opposition fake cases, then bait them into presenting those cases - and call them out in it.

dylan604 · on May 30, 2023

>If you can't do that, what exactly is your job as a lawyer?

to convince some (judge/jury) that your client is the one to win the trial. ethics are meant to be the definer of how far to go in that cause with licensing boards being the ultimate decider if you've crossed the line and are allowed to continue in the legal practice.

inopinatus · on May 30, 2023

Grinding against the safety rails on a mountain hairpin is not indicative of a competent driver.

dylan604 · on May 30, 2023

no, but that driver might just be that much more successful if they get away with it. in fact, while not a mountain hairpin as your example, there was a recent race car driver that did a similar thing by intentionally using the wall as a push back to allow a maneuver that allowed for success.

so a clever comment attempting to prove a point is not always indicative of a proven point ;-)

inopinatus · on May 31, 2023

This amounts to openly advocating for dangerous driving, which is exactly the reductio ad absurdum in rejecting the advice to lawyers above.

That maneuver by Ross Chastain? It is banned.

hluska · on May 30, 2023

No, but a winky smile after an intentionally inflammatory reply is usually indicative of someone I’m not interested in reading again. Doubly so when they refuse to use capital letters.

subarctic · on May 30, 2023

reading what again?

roqi · on May 31, 2023

Anything you have to say, mostly.

freejazz · on May 31, 2023

There's a difference between having a point and having a good point. You don't have the latter.

roqi · on May 31, 2023

> ethics are meant to be the definer of how far to go (...)

No, the definition of "ethics" is "moral principles that govern a person's behaviour or the conducting of an activity."

If you feel compelled to resort to immoral behavior to win arguments, you already lost, and your contributions to society are a net negative.

Your argument is like stating that ethics in life are meant to be the definer of how far to go in that cause, with police departments being the ultimate decider if you've crossed the line. However, if you rob and murder random people without being caught by the police, you are undoubtedly an utter failure in life.

freejazz · on May 30, 2023

hmmm - I'll let you ponder what you think the licensing boards use to determine if you crossed the line if not for ethics....

Ankaios · on May 30, 2023

It turns out there were precedents:

Case: Thompson v. Horizon Insurance Company, Filing: Plaintiff's Motion for Class Certification. Citation: The plaintiff's attorney cites the influential case of Johnson v. Horizon Insurance Company, 836 F.2d 123 (9th Cir. 1994), which established the standards for certifying class actions in insurance disputes. However, it has recently come to light that Johnson v. Horizon Insurance Company is a fabricated case that does not exist in legal records.

Case: Rodriguez v. Metro City Hospital, Filing: Defendant's Motion to Exclude Expert Testimony. Citation: The defense counsel references the landmark case of Sanchez v. Metro City Hospital, 521 U.S. 987 (2001), which set the criteria for admitting expert witness testimony in medical malpractice cases. However, it has now been discovered that Sanchez v. Metro City Hospital is a fictitious case and does not form part of legal precedent.

Case: Barnes v. National Pharmaceuticals Inc., Filing: Plaintiff's Response to Defendant's Motion for Summary Judgment. Citation: The plaintiff's lawyer cites the well-known case of Anderson v. National Pharmaceuticals Inc., 550 F.3d 789 (2d Cir. 2010), which recognized the duty of pharmaceutical companies to provide adequate warnings for potential side effects. However, further investigation has revealed that Anderson v. National Pharmaceuticals Inc. is a fabricated case and does not exist in legal jurisprudence.

freejazz · on May 30, 2023

In any of those examples, when pointed out by the court that the cases do not seem to exist, did the attorney's then go and print out a fake case and file it with the court? That's what the atty here did. It's a bit further than what you list.

lynx23 · on May 30, 2023

Wait, this is handled as "ChatGPT made up these cases" and not as "Layers deliberately used ChatGPT to fabricate stuff"? Is anyone really believing a lawyer is that stupid? I know, adssume good intentions and all, but in this case, really?

jeroenhd · on May 30, 2023

I believe some lawyers are that stupid.

I also believe lawyers who are stupid enough not to verify ChatGPT's responses should be treated as if they willfully lied to the court. "Oops, I didn't know" is a good defence when you're caught accidentally walking on the grass, not when you're in court.

wintogreen74 · on May 30, 2023

"Oops, I didn't know" has never been a valid defense for anyone; should definitely not be for lawyers, aka "people who are paid to know".

HelloNurse · on May 31, 2023

Are there traditional American puns with "lawyer" vs. "liar"?

pclmulqdq · on May 30, 2023

This appears to be an "individual vs company" case, so I think expecting some cost-cutting on the individual's side is normal. Just not done this poorly.

lamontcg · on May 30, 2023

> Is anyone really believing a lawyer is that stupid?

There's over a million lawyers in the United States.

You'd expect at least one of them to be a 1-in-a-million level of bad, or 4.7 standard deviations below the mean assuming a Gaussian distribution of competency.

An average person would normally never come across that lawyer in their lifetime, but media will find that lawyer and amplify their mistakes to everyone in the population.

codeflo · on May 30, 2023

There's no reason to believe it's a Gaussian distribution around the mean. Given that there are admission tests, you'd rather hope it's only the tail end of a Gaussian distribution, with the cutoff being what's required to pass the bar.

lamontcg · on May 31, 2023

There's going to be a distribution around the mean of the proctoring of those tests. There may even be outright corruption and bribery going on at the tail end.

drc500free · on May 30, 2023

Is there a term for this "winning the lottery jackpot is unbelievably unlikely, but every week we hear about someone winning it" effect? People get it when the media reports on the lottery, but somehow miss the amplifying effect on pretty much any other topic.

jmopp · on May 31, 2023

There is Littlewood's Law: a miracle (defined as a one in a million event) happens about once a month (assuming you observe one event every second and are observing your surroundings for 8 hours a day). There is also the law of truly large numbers: given a sufficiently large sample space, you will observe the very unlikely events

pkilgore · on May 30, 2023

In an adversarial system, you know that at the very least your opponent is checking your cases (you should also assume the Judge/clerks are too, but I never practiced enough in State courts to know how well that holds. In Federal District, it's absolutely true). Usually it's for incredibly small things, like unquoted/cited distinguishing remarks, later cases reversing the decision, and misquotes. So a whole case not existing is going to stand out like crazy.

Occam's Razor here is that this person was lazy, ignorant, careless, stupid, or any combination of those. To be intentionally fraudulent in this circumstance is the equivalent of trying to steal a gun from a cop. You're fucking with the one person in society who definitely has the training, motivation, and willingness to stop you.

pavon · on May 30, 2023

I can totally see a lawyer who hasn't been following closely read headlines about ChatGPT acing the Bar Exam, and start using it without being too skeptical.

What gets me is that they doubled down when asked to provide copies. Seriously, when that happens, you don't ask ChatGPT if the cases are real, you do your own damn search, and apologize profusely for your mistake. That really makes me question whether they were trying to pull a fast one, and then play dumb when caught, or if they really are that stupid.

Fauntleroy · on May 30, 2023

I'd imagine the lawyer didn't understand what ChatGPT really is. Many people seem to think that it "knows" things, and they took that at face value.

anigbrowl · on May 30, 2023

Especially after they replied to the first queries with the assurance that the citations were real and one that couldn't be found was taken from an unpublished opinion. Negligent at best. Should be a career-ending move for the guy; he can always pivot into the burgeoning legal AI field.

droopyEyelids · on May 30, 2023

A very intelligent person can simultaneously be very stupid, and stupid behavior can be increased with tight deadlines or a high workload.

Projectiboga · on May 30, 2023

It was his first try with the tech. He should have popped the cases into West Law and at least skimmed the briefs, he is at a big firm with that subscription. I'm not a lawyer and I at least have a clue for the legal research shortcuts. This guy must have been drinking and or having an affair to have pit so little effort into his first try using a new tech. I'm guessing there is a risk of humorous legal parody getting scooped up by the chat-bot.

laratied · on May 31, 2023

He might not have thought it was a shortcut though.

If he thinks it is like querying a database and had never heard of hallucinations then this could just be an honest mistake. Especially if he thinks AI would be smarter than a database.

My first thought that he was a mess in general but we really don't have enough information. Like the other guy saying he cheated in life, it is pretty absurd to infer that.

lamontcg · on May 30, 2023

My bet would be that he cheated his way through life and the bar somehow and landed the job on the basis of his looks and presentation or who he knew or something like that, so asking ChatGPT and cheating on it was more natural than using actual legal research tools.

Izkata · on May 30, 2023

With how management has been talking about "AI" over here, yeah, it wouldn't surprise me.

I think non-technical people, lawyers included, are being duped into thinking the true singularity-level AI revolution just happened.

dpiers · on June 1, 2023

You’re confusing stupidity with narrow expertise. I know CPAs who don’t file their own taxes because that wasn’t the focus of their niche. I know surgeons who don’t know the difference between a transistor and a resistor.

Plenty of smart people don’t understand how ChatGPT works or what it’s limitations are. A bunch of nerds built the best BS generator in history and marketed it as a super intelligent computer. If you ask it for relevant cases and it spits out a bunch of plausible information, is it really on them to know the tool is just really good at making things up?

CodesInChaos · on May 31, 2023

There are plenty of cases of otherwise intelligent people trusting GPT output too much. For example the professor who asked ChatGPT if it wrote the homework students handed in, and took the "yes" at face value.

freejazz · on May 30, 2023

The lawyer didn't use ChatGPT to purposefully fabricate cases, he just relied on GPT and assumed it wasn't lying.

lynx23 · on May 31, 2023

How do you know this?

freejazz · on May 31, 2023

Context clues! Logic? Like, why would the lawyer intentionally have ChatGPT make up bogus cases? Did you read them? They don't make any sense at all and were not likely to trick anyone. He's clearly just being lazy. My suggestion is, unlike the ChatGPT attorney, you use your head and apply some thought to the situation. Imagine what you can now do with this power!

throwawayadvsec · on May 31, 2023

why do you assume being a lawyer requires intelligence?

onionisafruit · on May 30, 2023

"unprecedented"? ChatGPT says there is precedent and gave me several citations.

CatWChainsaw · on May 31, 2023

This is a joke right? The citations were made up too.

rayiner · on May 30, 2023

Check out one of the fake opinions: https://storage.courtlistener.com/recap/gov.uscourts.nysd.57.... It even makes up a panel comprising real federal appellate judges (although one is from the fifth circuit while the fake case is from the eleventh circuit). I can see how someone unfamiliar with what GPT can do could get fooled.

greenyoda · on May 30, 2023

This is interesting. At the beginning of the document, we read that Susan Varghese, representing the estate of George Varghese, "appeals the district court's dismissal of her wrongful death claim". However, in the "Factual background" section, we read about someone named Anish Varghese who is alleging a breach of contract due to being denied boarding on a flight and having to purchase another ticket.

tromp · on May 30, 2023

> The other five bogus cases were called Shaboon v. Egyptair, Petersen v. Iran Air, Martinez v. Delta Airlines, Estate of Durden v. KLM Royal Dutch Airlines, and Miller v. United Airlines.

Tyler Durden no doubt...

ftxbro · on May 30, 2023

> "Lawyer: ChatGPT said the cases were real"

glenngillen · on May 30, 2023

Oh boy, and this one!

> The plaintiff's lawyer continued to insist that the cases were real. LoDuca filed an affidavit on April 25 in which he swore to the authenticity of the fake cases

banana_giraffe · on May 30, 2023

> In Shaboon v. Egyptair, 2013 IL App (1st) 111279-U.

I'm sure a case involving Egyptair is complicated, still .. I'd love to see the 111,279 page volume this citation claims to come from.

noman-land · on May 30, 2023

That's assuming it's citing the very last page of the document...

knifie_spoonie · on May 31, 2023

> LoDuca filed an affidavit on April 25 in which he swore to the authenticity of the fake cases

From the document another commenter linked above, it seems that affidavit is also dodgy:

"The April 25 affidavit was filled in response to the Orders of April 11 and 12, 2023, but is sworn to before a Notary Public on the 25th of January 2023."

ftxbro · on May 30, 2023

maybe they are trolling and their long game is to sue openai

ketralnis · on May 30, 2023

Doubling down on your bullshit which is itself fuelled by a robot's bullshit is a move

sharemywin · on May 30, 2023

we're really at this point aren't we.

sidewndr46 · on May 30, 2023

I've always felt like Al Gore v. the Internet was a landmark case

noman-land · on May 30, 2023

AI Gore

ss108 · on May 30, 2023

It would have taken about 1 minute to put each of them into a tool like Casetext, Lexis, WL, or Bloomberg Law to determine they didn't exist.

onion2k · on May 30, 2023

To do that you first need to distrust AI, and a lot of people don't. They think of GPT like Google-but-written-in-English. That is a large part of the problem.

anigbrowl · on May 30, 2023

That's not a valid excuse, though. Lawyers are paid big bucks to think, not to assume. Otherwise you could do your litigation for free by just asking interested people on Twitter. I went to law school and had to drop out due to an injury & attendant medical costs; it's a crime (as in going to jail) for me to practice law without being licensed, no matter how good my work product might be.

bentcorner · on May 30, 2023

There are probably thousands of lawyers that thought about using ChatGPT for their profession, most of them realized it lies and never got farther than that, maybe a few hundred actually tried it out and also realized ChatGPT was lying, this is the one guy managed to swiss-cheese-model his way through.

ss108 · on May 30, 2023

Well, leaving aside whether a lot of people don't distrust and whether a lawyer would or should be among those people, lawyers typically don't cite things without checking them first. I work at a biglaw firm, can't imagine it never getting caught that the cases don't exist. This sounds like a midsize firm, should be fairly similar.

There are a few small law firms/sole prop type guys, however, who I have crossed paths with for whom this kind of stupidity and carelessness would be on brand though.

Guess he was just in a rush and figured this would be one of the 2/10 times he files something without at least taking a look at the opinions first, and it ended up being a massive error.

MiddleEndian · on May 30, 2023

A lot of Google's top suggested results / autogenerated answers tell you nonsense or misunderstandings as well.

rafark · on May 30, 2023

Exactly. This is how AI should be used. Maybe Chatgpt hallucinates a lot, but that doesn’t mean it’s useless. It can still provide a lot of good value, you just have to filter out the crap, just like you’d do with any other tool. Especially if you’re using it in this context, you should double check its facts. I’d say it’s more the fault of the lazy lawyer that didn’t bother to verify than the fault of ChatGPT.

tombert · on May 30, 2023

100% agree.

I get a little annoyed at people seeing this AI, seeing how it's not absolutely perfect, and then acting like it's horrible.

I think the expression "All models are wrong but some are useful" applies very much to ChatGPT. It's a useful tool, even if it's not perfect.

cj · on May 30, 2023

I asked ChatGPT to tell me a riddle.

It was “What is always hungry, needs to be fed, and makes your hands red?” (Or something like that)

I asked for a hint about 5 times and it kept giving more legitimate sounding hints.

Finally I gave up and asked for the answer to the riddle, and it spit out a random fruit which made no sense as the answer to the riddle.

I then repeated the riddle and asked ChatGPT what the answer was, and it gave me the answer (“Fire”) which makes sense as the answer to the riddle.

But it was giving extremely bad hints, like “it starts with the letter P” and “it’s a fruit”.

That was a great way to show my non-tech family members the limitations of AI and why they shouldn’t trust it.

Playing “20 questions” with ChatGPT is another great way to expose its limitations. It knows the game and tries to play, but is terrible at asking questions to narrow down possible answers.

There really needs to be some confidence or accuracy score/estimation displayed alongside its output.

Or, learn how to say “I don’t know”

xahrepap · on May 30, 2023

I've had a few moments with ChatGPT that are great anecdotes similar to your own:

- Asked it to generate a MadLib for me to play that was no more than a paragraph long. It produced something that was several paragraphs wrong. I told it "no. That's X paragraphs. I asked for one that is only 1 paragraph long" and it would respond "I'm sorry for the misunderstanding. Let me try again" and then would make the same mistake. It never got it right

- Asked it, "Can you DM a game of Dungeons and Dragons?" and it said something like, "Yes! I'd love to DM a game of Dungeons and Dragons for you". Dumped some text to the screen about how we'd have to adapt it some. I asked it to begin, and it asked a few questions about the character I would want to play. I answered the few questions it asked. Then it finally dumped a page of text to the screen as "background" to my character and the quest I was going to embark on. Then it said something like, "You win. Good job! Hope you enjoyed your quest!"

I showed these to my family and they were all a little deflated about AI. Like they realized how willing it was to pretend like what you wanted and just make up its own answers.

mabbo · on May 30, 2023

There's definitely a potential for a D&D DM with an LLM, but you'd need a lot of careful prompting and processing to handle the token limits today's models have. Simply put: a d&d game has more story and state than the 30,000-ish words an LLM can think about at once.

I think there's a lot of interesting opportunities there.

kbenson · on May 30, 2023

I've also heard (here) that after you get 20-ish questions into an instance you start getting the really weird output. Some of the conjecture was because that's about how deep they trained.

In any case, if that's true, that's a very short role playing session, unless there's a good way to retain info but reset the state that accrues and causes problems (if indeed that happens).

mabbo · on May 30, 2023

Yeah, but you could ask the LLM to do a few different things at each step.

You could provide it with the background, the story, the secrets, and the summary of everything that has happened so far- as well as what new things have taken place. Then ask it to re-write the summary of the story so far.

Separately, you could give it all that context and what the players have asked of it, and as what response to give.

As well, you could be recording all the events that have happened in a vector store, and do a search on it when players ask questions, and use those as context to the LLM when asking it what to reply.

There's lots of neat tricks we can use to help an LLM overcome it's limitations.

bentcorner · on May 30, 2023

Aetolia tried this in a 24 hr experiment: https://nwn.blogs.com/nwn/2023/04/aetolia-mud-virtual-world-...

Sounds like it was a success! I suppose it comes down to cost - I think it'd be fun to try a single player game authored like this and would be willing to use my own API token to try it out.

piaste · on May 31, 2023

'AI Dungeon' has been a real product since GPT-2. You can buy it on Steam.

https://en.wikipedia.org/wiki/AI_Dungeon

ElFitz · on May 31, 2023

That’s the whole point of these "agents", and things like LangChain or LlamaIndex.

Haven’t gotten around to that part yet, it seems it could help.

lisasays · on May 30, 2023

I find those example quite encouraging, actually. In that it shows that the current SOA is still pretty far off from creating a planet-killing and/or species-ending AI.

Vvector · on May 30, 2023

Did you really expect ChatGPT to be a full blown dungeon master?

felipemnoa · on May 30, 2023

With all the hype about it, the average person does.

xahrepap · on May 31, 2023

Eventually, sure. But the point is that ChatGPT claims to be. But then does nothing more tell a story and then say “You win!”

BugsJustFindMe · on May 30, 2023

I need to know what version of ChatGPT you were using, because this is a critical piece of information that everyone just blatantly ignores, and I can only imagine that it's out of ignorance of the significance of the difference.

This is what happened when I asked ChatGPT 4...

ME

Give me hints without outright telling me the answer to the riddle: "What is always hungry, needs to be fed, and makes your hands red?"

ChatGPT 4

Hint 1: This object can be found in many homes, usually in a specific room.

Hint 2: It is not a living creature, but it does "consume" things.

Hint 3: The part about hands turning red relates to a common physical response to a certain temperature.

Hint 4: It's a crucial tool for a common household task.

Hint 5: While it can't eat in the traditional sense, you do put things into it quite frequently.

ME

What's the answer?

ChatGPT 4

The answer to the riddle is a "fire" or "fireplace." It's always hungry for more fuel (like wood), needs to be fed to keep burning, and the heat from the fire can make your hands red.

cj · on May 30, 2023

If I were still able to edit my original comment, I would add a note at the bottom that says to take the experience as a casual person downloading an AI app after hearing about it on the news.

Such as a lawyer who’s not particularly tech savvy.

The main point is it’s irresponsible to trust LLM output for any critical/important purpose because it’s not perfect. But too many first time users think it is perfect and trustworthy at face value, when it’s not.

I don’t actually know the version since I was interacting via an unofficial iOS app using some LLM under the hood. It may not have even been ChatGPT.

Freedom2 · on May 31, 2023

Then why did you specifically claim that you asked ChatGPT? These distinctions, while small, are important. Not really the type of mistake to make for a tech-savvy forum.

cj · on May 31, 2023

The iOS app that I had this interaction on was ChatOn.

https://apps.apple.com/us/app/chaton-ai-chat-bot-assistant/i...

The subtitle on the Apple App Store is "Powered by ChatGPT & GPT-4".

If you're to believe the app's advertising, it is powered by ChatGPT GPT-4.

selcuka · on May 30, 2023

I guess the point is GPT-4 hallucinates, too. Maybe it did well for this example but still a lawyer should not trust its output.

BugsJustFindMe · on May 30, 2023

Maybe, but it's surprisingly good in the face of all the non-version-indicating complaints about how terrible people think it is. Mostly I doubt that the lawyer was using GPT4, because the lawyer sounds like the kind of person who would be ignorant of the significance of the difference.

jiggawatts · on May 30, 2023

The kind of person too lazy to check the output of a computer program before submitting it to a court of law is the type of person too cheap to pay $20 for the good version of the program.

Think: Lionel Hutz.

Smoosh · on May 31, 2023

No, checking was done!

"Oops, I'd better remove that comma".

gumballindie · on May 30, 2023

Here is what poe told me:

“Think about things that you might need to feed regularly, especially things that might leave stains on your hands. It might be something that you encounter frequently in your daily life.”

“ The answer to the riddle is "fire". Fire is always hungry for fuel, needs to be fed with wood or other combustible materials, and can make your hands red if you handle it or if you're cooking food over an open flame.”

And second time:

“This thing is a natural phenomenon that can occur in different situations and locations. It is often associated with warmth and light, and it can be both helpful and dangerous. People have been using it for various purposes for thousands of years.”

“The answer to the riddle is "fire". Fire is a natural phenomenon that needs fuel to keep burning, and it produces heat and light that can make your hands red. Fire has been used by humans for various purposes for thousands of years, such as cooking, heating, and providing light. However, fire can also be dangerous if not handled carefully.”

I am amazed at how a free bot is basically as good, if not better, as a openai’s bot. I am quite certain locally run llms will also be able to outcompete chatgpt. Probably why sam altman is desperate to block them.

icelancer · on May 30, 2023

When the answer is something ridiculous or stupid, it's 95%+ of the time GPT-3.5-turbo and is rarely disclosed by the other party. GPT-4 is an order of magnitude better, if not two orders of magnitude better.

It's hard to tell if the party crapping on ChatGPT is doing so out of ignorance or malice.

Finetuning with GPT-4 can't come soon enough...

PhasmaFelis · on May 31, 2023

Wow. That's kind of terrifying.

sn0wf1re · on May 30, 2023

That's GPT-4, not ChatGPT (3.5-turbo I think).

Also, yes you can get correct information by tailoring your prompts, but that isn't the issue. The issue is that some prompts lead to bad results and confusing/incorrect answers. You changed what OP queried by providing the riddle and asking for hints to that riddle, whereas OP asked for a random riddle and then hints to that riddle.

BugsJustFindMe · on May 30, 2023

> That's GPT-4, not ChatGPT

It absolutely is ChatGPT, the paid monthly "Plus" version, using the GPT4 model instead of the 3.5 model.

gerdesj · on May 30, 2023

> Or, learn how to say “I don’t know”

It doesn't know that it doesn't know! It is, very roughly speaking, a model that is designed to print out the most likely word given its current input and training, and then the next word etc.

Whereas you or I might be mistaken about some of our faculties, memories and skills, ChatGPT cannot possibly "know" what its limitations are. It was never taught what it was not taught (obvs) nor can it conceive of what it was not taught about. It might come back with some useful chat in this regard but it cannot reason.

okasaki · on May 30, 2023

It seems that you don't know what you don't know, really.

There's no way to definitively know what properties ChatGPT has. It does seem to reason to some extent and it does often say that some information isn't known/there's no data. And it almost obnoxiously often tells you that it's simplifying a complex and multifaceted situation.

gerdesj · on May 30, 2023

Its a model that takes an input and spits out the most likely output given its training.

"There's no way to definitively know what properties ChatGPT has." - yes there is: ask it how the war in Ukraine is progressing or some other time based thing. It stops in 2021.

It is a really useful tool but it isn't sentient.

okasaki · on May 31, 2023

I don't agree. It's like saying humans are just animated meat. You're massively simplifying the situation.

gerdesj · on June 2, 2023

I never mentioned humans or meat yet you decided to pop those words in my mouth (mmm tasty).

I merely tried to very quickly give a hint as to what ChatGPT is - it's a LLM and that is something that takes a (normally textual) input and outputs an expected best guess word for an answer and then does it repeatedly until the full answer is returned.

roflyear · on May 30, 2023

It doesn't know things. I promise.

jimnotgym · on May 30, 2023

> Or, learn how to say “I don’t know”

This is the correct answer. It is like a sad salesman who is out of his depth, but decides to keep bullshiting!

zamadatix · on May 31, 2023

I'd rather a confidence score for each response. The last thing I need is another reason for the AI to ignore the question or feel the need to explain why it was ignoring it.

oblio · on May 30, 2023

Well, that tells you a lot about:

1. The people designing it (either optimists or looking for a quick exit).

2. The learning set they're using, which I believe is some kind of internet crawl of sorts? I imagine humanity, as a whole, bullshits its way through most of its life.

wizofaus · on May 31, 2023

It makes sense that there wouldn't be a lot of training data that consists of people admitting they don't know - forums like stack overflow wouldn't even permit such a post/comment anyway. So presumably ChatGPT is very heavily weighted against giving "I don't know" as an answer when virtually all the examples of people asking questions in its training data are followed by plausible sounding answers.

zerohp · on May 30, 2023

It is almost certainly case 2. People don't often write text on the internet just to say that they don't know. It doesn't even make sense to post that most of the time.

A great example would be on a Q/A forum or something like Stackoverflow. It better to let someone else answer when you don't know.

dylan604 · on May 30, 2023

or a politician. even when presented with incontrovertible evidence, double down

travisjungroth · on May 30, 2023

> That was a great way to show my non-tech family members the limitations of AI and why they shouldn’t trust it.

These are the limitations of the version of ChatGPT you were using at that moment. They are not categorical limitations of AI or even LLMs.

It’s amazing to me how many people are sleeping on AI, mixing up the failing cases of a freemium chatbot for the full capability of the tech, even on HN. LLMs can say “I don’t know”. Even ChatGPT can do it. Ask some super niche historical questions of any version and see what you get. Is it perfect every time? No. But that’s something that can be reduced.

Over the next year, you’ll see more instances of lawyers citing hallucinated cases. There will also be a handful of startups that hook up LLMs to document stores, and they’ll be able to check for this sort of thing and do an even better job.

Bilal_io · on May 30, 2023

> LLMs can say “I don’t know”. Even ChatGPT can do it.

That's the problem in my opinion. When you know something is capable of saying "I don't know" but confidently spits out some hallucinated BS is when the average person eats it up.

travisjungroth · on May 30, 2023

It is definitely a problem. OpenAI does a lot to warn people, but I’m not really sure it’s enough.

kbenson · on May 30, 2023

I don't know exactly why, but for some reason this made me think of qAnon, and now I'm thinking of an AI trained on qAnon theories that people can form a community around like they did qAnon, and frankly that's one of the most terrifying things I've thought in quite a while.

melagonster · on May 31, 2023

I remember someone built a 4chan robot and posted it to HN. robot immediately display the terrible part of there.

lettergram · on May 30, 2023

I made https://AskHN.ai

What it does is not try to answer, but collect previous topics discussed by experts. Then answer the question based on the text, a far more reliable approach.

bunabhucan · on May 30, 2023

How does it qualify experts? I love the discussion here but if it turns to international nuclear strategy or the minutae of electrical networks (or presumably anything outside the regular wheelhouse) I notice that the quality goes down but the confidence stays the same.

lettergram · on May 30, 2023

Under the hood it builds and ranks the expertise of everyone in the network. That said, it doesn’t have knowledge outside the network, so if the network itself has low quality experts or no data, it’s going to give subpar results

chrisco255 · on May 30, 2023

Maybe the real definition of intelligence isn't being able to answer questions effectively, but being able to know what you know and know what you don't know.

waserwill · on May 30, 2023

There's truth to this, but in a time and place. There are times being very specific about what you know and don't know is important, but most of the time we are learning little by little, and we benefit from saying things we aren't entirely comfortable with, if only to figure out whether they are true and/or socially acceptable.

tanbog5 · on May 30, 2023

Or, learn how to say “I don’t know”

It can't. Correct me if I'm wrong - I probably am - but as a generative language model I was under the impression it has no actual understanding of the meaning of it's words, only the probability they work well together.

You wouldnt expect dice to understand why you didn't want to roll snake eyes

wizofaus · on May 31, 2023

It could if it had enough training data that demonstrated "I don't know" is a perfectly valid answer to a question. The risk than is that it answers "I don't know" even when it is actually capable of providing a useful answer. In fact I've had it do that before too - not exactly "I don't know" but vague statements that aren't particularly helpful, when it turns out with the right prompt it "knew" exactly what the answer was.

jiggawatts · on May 30, 2023

ChatGPT 3.5 or GPT 4?

Almost every negative comment about LLMs is by someone using an older, weaker model and making generalisations.

Here’s GPT 4 giving me a riddle: https://chat.openai.com/share/1753ce5a-d44d-44ac-bc97-599a26...

hospitalhusband · on May 30, 2023

> But was it GPT4

I keep seeing this cop-out, which ignores that it's fundamentally the same architecture, and has the same flaws. More wallpaper to hide the cracks better makes it an even worse tool for these use cases because all it does is fool more people into thinking it has capabilities that it fundamentally doesn't.

Buttons840 · on May 30, 2023

I don't think this is a fair argument. If we compare a GPT4 architecture with 5,000 parameters and a GPT4 architecture with 1 trillion parameters, should we judge the capabilities of both by the 5,000 parameter version, because they're both the same architecture?

There is more than architecture that can set them apart as well. GPT4 may have been trained by a slightly different algorithm, or on different data, and this can result in fundamentally different results.

Most of these conversations are not focused on one specific version, but are about the capabilities of LLMs in general, and it is implied we are talking about state-of-the-art LLMs, and GPT3 is no longer state-of-the-art.

danielmarkbruce · on May 30, 2023

This is nonsense. It's not a cop-out to say "use the latest, most capable model before complaining". Anyone remotely close to this field knows model size matters, amount of training data matters, quality of training data matters, and several other variables matter. Even if someone knows zero about it, just using 3.5 v 4 is enough to see they are two different things. Like a lizard v a human.

hospitalhusband · on May 31, 2023

It's still fundamentally the same, hallucinates just the same, and anthropomorphizes itself as a confident, knowledgeable, intelligent being just the same. A newer, better, faster, more capable car still isn't an airplane, even if it go fast enough to spend several seconds in the air.

danielmarkbruce · on May 31, 2023

Sure, and 40 year olds have the same capabilities as 4 year olds, because "same architecture" or "fundamentally the same". And putting random weights inside the GPT-4 model architecture should behave "fundamentally the same" as the trained GPT-4 weights, because it's "same architecture". Forget this "training" stuff.

hospitalhusband · on May 31, 2023

It's not a person, it's a machine. And it's one that will still produce hallucinations that embarrassingly prove that it has no notion of intelligence, and do so confidently. That it does so less than it's sibling is entirely irrelevant.

danielmarkbruce · on May 31, 2023

Nope, wrong. The number of errors and the magnitude of each is very relevant.

hospitalhusband · on May 31, 2023

Black box hallucination engines do what they do https://news.ycombinator.com/item?id=36134249

dorkwood · on May 31, 2023

To me it's a bit like someone making the claim "humans are flawed, and we should think critically about the things they say", and someone responding with "well which human are you talking about? Because Einstein is orders of magnitude above the Walmart checkout guy".

macjohnmcc · on May 30, 2023

I have asked it to write code. The code looked reasonable but did not work (using a non-existent API or library). I said the library XXX doesn't exist. And it would respond with an apology and a new solution that maybe worked maybe didn't. Sometimes it could pull off a save eventually.

runsWphotons · on May 30, 2023

i keep getting it to write code that works but then its example usage makes no sense. i point out the code is write but the example it gives is wrong and it can almost never fix it. version 4.

Buttons840 · on May 31, 2023

> Playing “20 questions” with ChatGPT is another great way to expose its limitations. It knows the game and tries to play, but is terrible at asking questions to narrow down possible answers.

GPT4 did well enough for me. The questions below are asked by GPT4, and the yes/no answers are my responses to GPT4. I think it did quite well at 20 questions.

Is it a person? No. Is it a place? No. Is it a tangible object? Yes. Is the object man-made? Yes. Is the object typically found indoors? Yes. Is it an electronic device? No. Can it be held in one hand? Yes. Is it used primarily in the kitchen? No. Is it related to a specific hobby or activity? Yes. Is it used in a sport or physical activity? No. Is it used in creating or making something? No. Is it associated with a board game or a card game? No. Is the object related to music? Yes. Is it a musical instrument? No. Is it used to control or manipulate sound? Yes. Is it typically used in the process of recording music? Yes. Is it a type of microphone? No. Is it used to alter or modify the sound of a musical instrument during the recording process? No. Is it related to the control or adjustment of volume or sound levels? No. Is the object used to store or convey recorded music? Yes. Is it a vinyl record? No. Is it a compact disc (CD)? Yes. (22 questions total.)

seba_dos1 · on May 30, 2023

GPT does nothing but predict the next word (token, to be more exact). If you ask it for hints, it's going to come up with some hints that sound like they could be plausible in context of the original riddle. If you then ask it what the answer was, it's going to try to take all the earlier hints into account, because that's now the text it's trying to complete.

There's no entity sitting on the other side of the screen thinking about riddle's solution from the start. There's just a model that completes the given text one token at a time. You should expect similar results as if it was bunch of people passing a single piece of paper to each other, each writing a single word on it, not communicating with each other in any other way and trying their best to come up with coherent result. It's not going to make sense all the time.

dmd · on May 31, 2023

> You should expect similar results as if it was bunch of people passing a single piece of paper to each other, each writing a single word on it, not communicating with each other in any other way and trying their best to come up with coherent result.

I'm pretty sure GPT-4's context window is bigger than '1 token'.

seba_dos1 · on May 31, 2023

Yes, obviously...? A piece of paper holds more than a single word as well. That piece of paper is the context in this analogy.

It still generates one token at a time and can't backtrack afterwards.

marcusb · on May 30, 2023

> Or, learn how to say “I don’t know”

That would require semantic understanding the LLM doesn't possess.

dabluecaboose · on May 30, 2023

>Playing “20 questions” with ChatGPT is another great way to expose its limitations. It knows the game and tries to play, but is terrible at asking questions to narrow down possible answers.

The Rise of the Machines will be staved off as long as ChatGPT doesn't absorb the powers of the mighty Akinator

tialaramex · on May 30, 2023

A colleague tried 20 questions with ChatGPT and the answer they'd chosen was "Margaret Thatcher" (UK Prime Minister, the "Iron Lady")

ChatGPT got as far as basically narrowing it down to post-War UK Prime Ministers, which is fairly impressive although it only had a few questions left. Then though it decided the answer must be "Winston Churchill". Churchill isn't meaningfully a post-War PM. He lost the July 1945 General Election, which was before the Pacific victory.

It did guess Maggie, with nothing left, at a point where I don't think it had ruled out Blair, Cameron or Heath, let alone say, Liz Truss, but guessing Churchill first shows the limitations of such a model.

abirkill · on May 30, 2023

Churchill was also prime minister from 1951-55.

tialaramex · on May 31, 2023

Huh, it's probably that simple, I mean it doesn't explain why it thought it had definitely found the answer given there were a bunch more PMs to go, but yeah that does qualify Churchill.

screamingninja · on May 30, 2023

> I asked ChatGPT to tell me a riddle.

Which version of ChatGPT, if you don't mind me asking?

cj · on May 30, 2023

It was an iOS app (ChatOn) not the official app. So probably GPT-3 (edit: the app claims to use GPT-4, so not sure)

After the riddle, I bought the $20/mo subscription via the official OpenAI app to try it on GPT-4. I started by trying to play “20 questions” but we couldn’t get past 10 questions before getting an error message “rate limit exceeded, try again in an hour”

CyberDildonics · on May 31, 2023

Why would they mind you asking?

0xTJ · on May 30, 2023

I gave playing 20Q with the GPT-4 version, and it did surprisingly well. It didn't quite get it in 20 questions, and correctly gave up once it hit that, but it was almost at my object (a computer power supply). Given what it was asking, I expect it would get it in one of two more questions. It's just one sample, but it asked the right questions to narrow the field to computer-related things, and then narrowed down to power-related things, with its final guess being a UPS.

dools · on May 30, 2023

Is that 3.5 or 4? I don't play word games with GPT but for coding tasks it went from unusable most of the time to absolutely indispensible team member between 3.5 and 4.

anigbrowl · on May 30, 2023

Playing “20 questions” with ChatGPT is another great way to expose its limitations. It knows the game and tries to play, but is terrible at asking questions to narrow down possible answers.

I began doing this last winter, and while it tends to be a bit slow I'm quite impressed that it can manage at all.

Our_Benefactors · on May 30, 2023

Chat GPT versus Akinator!

threeseed · on May 30, 2023

ChatGPT doesn’t know what is right or wrong.

It simply knows what the highest probability next word should be.

pshc · on May 30, 2023

I think part of this is because GPT doesn’t have any “hidden variable” storage and doesn’t get any prep time up front to come up with something coherent. Just completes the next token based on the previous context.

Buttons840 · on May 30, 2023

You can give it prep time, tell it to reason out loud and it will write a paragraph (or two) about what it is thinking--or rather, the paragraph is its "thinking".

shahar2k · on May 30, 2023

the interesting part is I've interacted with some people going through manic phases and it's REALLY similar to chatgpt, the ability to string words and thoughts together without any large scale plans.

agildehaus · on May 30, 2023

I played 20q three times with GPT4. Aluminum can, plastic straw, and tire pressure gauge were my items. It had no problems getting to the answers with sensible questions throughout.

ravenstine · on May 30, 2023

Maybe you didn't get the riddle.

rvba · on May 30, 2023

> The lawyer's affidavit said he had "never utilized ChatGPT as a source for conducting legal research prior to this occurrence (...)"

I wonder if the court tried to verify that.

mensetmanusman · on May 30, 2023

"He was a bad lawyer. I am a good Bing. I will send him to jail."

boringg · on May 30, 2023

Disbarred?

Antoniocl · on May 30, 2023

I would hope so, at a minimum! I'm shocked that signing an affidavit in a court of law certifying the existence of made-up courses isn't simply criminal.

favorited · on May 30, 2023

Almost certainly not. It's astoundingly difficult to get disbarred. I heard a lawyer say the only reliable way to get disbarred is to steal money from your clients. Judges rarely even use their inherent power to sanction, though he certainly will in this case.

anamexis · on May 30, 2023

Intent (mens rea) is a core part of what makes a crime a crime.

NoboruWataya · on May 31, 2023

Men rea does not mean intent, it simply refers to the mental element of a crime. It can be intent or some lower bar such as recklessness or gross negligence, depending on how the crime is defined.

anigbrowl · on May 30, 2023

Look up 'strict liability'.

occamrazor · on May 30, 2023

Strict liability is extremely rare in criminal matters.

kube-system · on May 30, 2023

That's an exception, not the general rule.

anigbrowl · on May 31, 2023

What of it? It exists in some cases, so the mens rea requirement is not universal.

ftxbro · on May 30, 2023

I almost wonder if it could be something like one who deliberately slips on a grocery store floor to do some legal shenanigans. Like I'm sure there are some people who if they saw some spilled milk on a store floor they would be like 'it's my lucky day' and maybe similarly this lawyer saw a service that tells lies and he could have been like 'this is too good to be true'.

StrangeATractor · on May 30, 2023

If he isn't he'll need to find new work anyway. Imagine searching that guy's name while looking for an attorney.

toyg · on May 30, 2023

Actually, I think he shouldn't be - if suitably scared and scarred, the guy will likely stay away from anything resembling AI/ML for the rest of his life.

Unlike language models, humans really do learn.

Conscat · on May 30, 2023

This lawyer does not read news, and he is not skeptical of overhyped technology. He might learn to be wary of AI now, but the underlying issue, this appalling lack of critical thinking skill, isn't likely to change.

macmac · on May 30, 2023

That is a hilarious pun.

morkalork · on May 30, 2023

Imagine being that clown's client.

thih9 · on May 31, 2023

> Schwartz didn't previously consider the possibility that an artificial intelligence tool like ChatGPT could provide false information

Don’t you have to click through a number of popups about that before accessing chatgpt?

CatWChainsaw · on May 31, 2023

Do you painstakingly read the EULA/ToS of every service you use that has one? Because you're in a slim minority if you do.

thih9 · on June 1, 2023

It’s not eula/tos though.

> This is a free research preview.

> Our goal is to get external feedback in order to improve our systems and make them safer.

> While we have safeguards in place, the system may occasionally generate incorrect or misleading information and produce offensive or biased content. It is not intended to give advice.

This is the complete text of the first popup. There are three, each with a bit of text highlighting that this is an experimental service. There are emoji and there is an alert emoji next to the “incorrect/misleading” bit. They show up every time you visit.

uguuo_o · on May 31, 2023

I thought I had just seen a post yesterday [0], so I immediately thought how can there be 2 such lawyers? Competition on who would get caught first? It seems to be referring to the same case.

[0] https://news.ycombinator.com/item?id=36116652

CatWChainsaw · on May 31, 2023

I don't see this as a particularly wild case of LLMs gone wrong. When "you have to take the bad with the good" includes a constant undermining of reality and an even more desperate need for fact-checkers, I wonder if it will have been worth it in the end. Likely not.

xbar · on May 30, 2023

I asked ChatGPT to disbar an attorney.

mrangle · on May 31, 2023

Indicates the inevitability of reduced time that it takes for fake history and data to be declared valid and widely defended by core institutions.

Eventually the onslaught of subtle yet elaborate falsehoods will overwhelm the institutional filters.

not2b · on May 30, 2023

At minimum the sanctions should include a fine sufficient to cover the costs of everyone who had to deal with this: the court, opposing attorneys. Maybe punitive damages too. But let the guy keep his law license (unless there's a second offense).

CSMastermind · on May 30, 2023

It feels like there definitely should be punitive damages?

It's like they didn't even check the text that ChatGPT generated for correctness.

Zetice · on May 30, 2023

So how do LLMs fit into the legal profession, if at all?

Do legal tools that make use of LLMs just need to come with big ol' disclaimers at the top saying, "This tool does not represent a legal opinion, please verify the output independently."?

delecti · on May 30, 2023

I think LLMs should be used with basically the same stipulations in any field. The words it outputs are usually valid English, but not necessarily accurate, so it's good for brainstorming but needs to be fact-checked. Overall, whether it's useful depends on whether the time required for the latter is less than the time you save with the former. Personally, I've found them most useful as a way to provoke myself into Cunningham's Law, essentially relying on the fact that they make shit up. https://meta.wikimedia.org/wiki/Cunningham%27s_Law

alfalfasprout · on May 30, 2023

At minimum, attorneys need to review the work's citations and wording for accuracy.

At the end of the day, this is not too different than LLMs consistently writing subtly broken code-- someone needs to comb through it and fix it.

We're currently in the phase where the potential of LLMs is suddenly appealing to many but where most people don't quite understand it's not really magic and that even after they evolve they will remain critically flawed. Expect serious growing pains as a result.

clnq · on May 30, 2023

The same way they fit into all professions demanding accountability and precision - they can be used for exploration but everything they say should be verified.

more_corn · on May 31, 2023

Every citation created by chat gpt will be hallucinated. It knows what they look like. It doesn’t know what they are. It doesn’t actually “know” anything. It is a statistical engine for generating the next reasonable looking word.

rgoulter · on May 31, 2023

I asked an LLM "during the day, what colour is the sky?", and it told me the sky is typically blue, and that this has to do with sunlight scattering as it enters our atmosphere.

I think it's worth using "hallucinate" to more precisely refer to inaccurate content. I'd think of this in the same sense as a false-positive from some test result.

I like the term "bullshit" when used to mean "made without regard to the truth".

zugi · on June 1, 2023

Am I the only one here entertained by the judge's clever wording?

"Hey, all the case law you cited in your filing is made up. That's unprecedented!"

Brendinooo · on May 30, 2023

Yeah, AI-generated fake cases would definitely be unprecedented, unless SmarterChild did a stint as a paralegal in the early aughts...

kristianbrigman · on May 30, 2023

How do these usually get checked? I mean, maybe he’s already done this successfully a few times.

ajb · on May 30, 2023

They get checked by the lawyer for the other side.

pkilgore · on May 30, 2023

Usually the first thing a Judge's clerk will do before a judge reads a brief is pull all of the authority cited for the judge to have ready-to-review along side the brief itself (if not analyzed, highlighted, summarized, etc. etc. depending on the judge).

Now, whether the judge actually reads them is debatable (I had my doubts sometimes). But you bet your ass that if the Clerk simply cannot find a case, the Judge will be informed of that.

YMMV in State courts, which can be all over the place in terms of professionalism. But you should at least assume your opponent is going to read your cases because the easiest way to beat someone in court is to point out the law you rely on is bad.

seydor · on May 30, 2023

this is funny. However GPTs are great linear interpolators between stories and probably also between judgements. Maybe they will be useful to replace judges

0xbadcafebee · on May 30, 2023

Were they any worse than the normal junk cases that get filed every day in this country?

bitwize · on May 30, 2023

So turds turn up in AI law as well...

thefourthchime · on May 30, 2023

Sigh, I'm getting very sick of hearing about how "ChatGPT" makes stuff up. Yes, 3.5 made a lot of stuff up, 4.0 still does, but it's much rarer.

I wish people would mention this, it's all treated as the same thing. It's like talking about how unreliable these "Airplanes" are when they are talking about prop planes, even though jets are out.

throwway120385 · on May 30, 2023

If I had an airplane that usually flew just fine as long as I didn't use it for business, but would sometimes crash directly into the ground for no apparent reason, I would be talking about how unreliable it is too. Especially if the maker of the airplane touted its unusual benefits for searching for information in a business setting.

dimgl · on May 30, 2023

I'm so sick of seeing this strawman. "Well, ChatGPT 4 is better!" Well, how should I know? I'm not paying for ChatGPT 4. At least a product like Midjourney is consistent both in its paid and unpaid form.

favorited · on May 30, 2023

These are generative conversation bots. Their #1 priority is to make things up.

alfalfasprout · on May 30, 2023

except ChatGPT 4 makes a ton of stuff up too. Rarer? Slightly. But the stuff it makes up is even more plausible sounding and harder to catch.

The improvements, in practice, are in the stuff it doesn't hallucinate. LLMs as a whole are still to be treated with great care.

evanwise · on May 31, 2023

Every post that is even remotely critical of ChatGPT has someone posting this exact sentiment like clockwork. It's probably just mindless fanboys, but I'm genuinely starting to wonder if it's some kind of astroturfed ad campaign for OpenAI.

ok123456 · on May 30, 2023

Most people are using 3.5 since that's the free model.

inopinatus · on May 30, 2023

The difference is vast and somewhat irritating to those with GPT-4 access today, but it’s also besides the point. The only people who deserve to cringe at the conflation are OpenAI’s marketing team, but even they may take solace in the sheer cut-through of column inches their cloven brand obtains, no doubt anticipating a future retirement of GPT-3.