I think the judge should not take into consideration anything about where the lawyer said they got the case information from.
When you go to court and cite previous cases, you are responsible for ensuring they are real cases. If you can't do that, what exactly is your job as a lawyer?
If someone hacked westlaw and maliciously replaced opinions, I think it would not warrant censure of lawyers who cited westlaw. Maybe a bad day with the judge, and they'd have to amend the filings. This is malpractice in a way that relying on an accepted if technically fallible resource isn't
Big court case where it's pretty clear the plucky good guys aren't going to win it. So they hire hackers to break into westlaw and give the opposition fake cases, then bait them into presenting those cases - and call them out in it.
>If you can't do that, what exactly is your job as a lawyer?
to convince some (judge/jury) that your client is the one to win the trial. ethics are meant to be the definer of how far to go in that cause with licensing boards being the ultimate decider if you've crossed the line and are allowed to continue in the legal practice.
no, but that driver might just be that much more successful if they get away with it. in fact, while not a mountain hairpin as your example, there was a recent race car driver that did a similar thing by intentionally using the wall as a push back to allow a maneuver that allowed for success.
so a clever comment attempting to prove a point is not always indicative of a proven point ;-)
No, but a winky smile after an intentionally inflammatory reply is usually indicative of someone I’m not interested in reading again. Doubly so when they refuse to use capital letters.
> ethics are meant to be the definer of how far to go (...)
No, the definition of "ethics" is "moral principles that govern a person's behaviour or the conducting of an activity."
If you feel compelled to resort to immoral behavior to win arguments, you already lost, and your contributions to society are a net negative.
Your argument is like stating that ethics in life are meant to be the definer of how far to go in that cause, with police departments being the ultimate decider if you've crossed the line. However, if you rob and murder random people without being caught by the police, you are undoubtedly an utter failure in life.
Case: Thompson v. Horizon Insurance Company,
Filing: Plaintiff's Motion for Class Certification.
Citation: The plaintiff's attorney cites the influential case of Johnson v. Horizon Insurance Company, 836 F.2d 123 (9th Cir. 1994), which established the standards for certifying class actions in insurance disputes. However, it has recently come to light that Johnson v. Horizon Insurance Company is a fabricated case that does not exist in legal records.
Case: Rodriguez v. Metro City Hospital,
Filing: Defendant's Motion to Exclude Expert Testimony.
Citation: The defense counsel references the landmark case of Sanchez v. Metro City Hospital, 521 U.S. 987 (2001), which set the criteria for admitting expert witness testimony in medical malpractice cases. However, it has now been discovered that Sanchez v. Metro City Hospital is a fictitious case and does not form part of legal precedent.
Case: Barnes v. National Pharmaceuticals Inc.,
Filing: Plaintiff's Response to Defendant's Motion for Summary Judgment.
Citation: The plaintiff's lawyer cites the well-known case of Anderson v. National Pharmaceuticals Inc., 550 F.3d 789 (2d Cir. 2010), which recognized the duty of pharmaceutical companies to provide adequate warnings for potential side effects. However, further investigation has revealed that Anderson v. National Pharmaceuticals Inc. is a fabricated case and does not exist in legal jurisprudence.
In any of those examples, when pointed out by the court that the cases do not seem to exist, did the attorney's then go and print out a fake case and file it with the court? That's what the atty here did. It's a bit further than what you list.
Wait, this is handled as "ChatGPT made up these cases" and not as "Layers deliberately used ChatGPT to fabricate stuff"? Is anyone really believing a lawyer is that stupid? I know, adssume good intentions and all, but in this case, really?
I also believe lawyers who are stupid enough not to verify ChatGPT's responses should be treated as if they willfully lied to the court. "Oops, I didn't know" is a good defence when you're caught accidentally walking on the grass, not when you're in court.
This appears to be an "individual vs company" case, so I think expecting some cost-cutting on the individual's side is normal. Just not done this poorly.
> Is anyone really believing a lawyer is that stupid?
There's over a million lawyers in the United States.
You'd expect at least one of them to be a 1-in-a-million level of bad, or 4.7 standard deviations below the mean assuming a Gaussian distribution of competency.
An average person would normally never come across that lawyer in their lifetime, but media will find that lawyer and amplify their mistakes to everyone in the population.
There's no reason to believe it's a Gaussian distribution around the mean. Given that there are admission tests, you'd rather hope it's only the tail end of a Gaussian distribution, with the cutoff being what's required to pass the bar.
There's going to be a distribution around the mean of the proctoring of those tests. There may even be outright corruption and bribery going on at the tail end.
Is there a term for this "winning the lottery jackpot is unbelievably unlikely, but every week we hear about someone winning it" effect? People get it when the media reports on the lottery, but somehow miss the amplifying effect on pretty much any other topic.
There is Littlewood's Law: a miracle (defined as a one in a million event) happens about once a month (assuming you observe one event every second and are observing your surroundings for 8 hours a day). There is also the law of truly large numbers: given a sufficiently large sample space, you will observe the very unlikely events
In an adversarial system, you know that at the very least your opponent is checking your cases (you should also assume the Judge/clerks are too, but I never practiced enough in State courts to know how well that holds. In Federal District, it's absolutely true). Usually it's for incredibly small things, like unquoted/cited distinguishing remarks, later cases reversing the decision, and misquotes. So a whole case not existing is going to stand out like crazy.
Occam's Razor here is that this person was lazy, ignorant, careless, stupid, or any combination of those. To be intentionally fraudulent in this circumstance is the equivalent of trying to steal a gun from a cop. You're fucking with the one person in society who definitely has the training, motivation, and willingness to stop you.
I can totally see a lawyer who hasn't been following closely read headlines about ChatGPT acing the Bar Exam, and start using it without being too skeptical.
What gets me is that they doubled down when asked to provide copies. Seriously, when that happens, you don't ask ChatGPT if the cases are real, you do your own damn search, and apologize profusely for your mistake. That really makes me question whether they were trying to pull a fast one, and then play dumb when caught, or if they really are that stupid.
Especially after they replied to the first queries with the assurance that the citations were real and one that couldn't be found was taken from an unpublished opinion. Negligent at best. Should be a career-ending move for the guy; he can always pivot into the burgeoning legal AI field.
It was his first try with the tech. He should have popped the cases into West Law and at least skimmed the briefs, he is at a big firm with that subscription. I'm not a lawyer and I at least have a clue for the legal research shortcuts. This guy must have been drinking and or having an affair to have pit so little effort into his first try using a new tech. I'm guessing there is a risk of humorous legal parody getting scooped up by the chat-bot.
He might not have thought it was a shortcut though.
If he thinks it is like querying a database and had never heard of hallucinations then this could just be an honest mistake. Especially if he thinks AI would be smarter than a database.
My first thought that he was a mess in general but we really don't have enough information. Like the other guy saying he cheated in life, it is pretty absurd to infer that.
My bet would be that he cheated his way through life and the bar somehow and landed the job on the basis of his looks and presentation or who he knew or something like that, so asking ChatGPT and cheating on it was more natural than using actual legal research tools.
You’re confusing stupidity with narrow expertise. I know CPAs who don’t file their own taxes because that wasn’t the focus of their niche. I know surgeons who don’t know the difference between a transistor and a resistor.
Plenty of smart people don’t understand how ChatGPT works or what it’s limitations are. A bunch of nerds built the best BS generator in history and marketed it as a super intelligent computer. If you ask it for relevant cases and it spits out a bunch of plausible information, is it really on them to know the tool is just really good at making things up?
There are plenty of cases of otherwise intelligent people trusting GPT output too much. For example the professor who asked ChatGPT if it wrote the homework students handed in, and took the "yes" at face value.
Context clues! Logic? Like, why would the lawyer intentionally have ChatGPT make up bogus cases? Did you read them? They don't make any sense at all and were not likely to trick anyone. He's clearly just being lazy. My suggestion is, unlike the ChatGPT attorney, you use your head and apply some thought to the situation. Imagine what you can now do with this power!
Check out one of the fake opinions: https://storage.courtlistener.com/recap/gov.uscourts.nysd.57.... It even makes up a panel comprising real federal appellate judges (although one is from the fifth circuit while the fake case is from the eleventh circuit). I can see how someone unfamiliar with what GPT can do could get fooled.
This is interesting. At the beginning of the document, we read that Susan Varghese, representing the estate of George Varghese, "appeals the district court's dismissal of her wrongful death claim". However, in the "Factual background" section, we read about someone named Anish Varghese who is alleging a breach of contract due to being denied boarding on a flight and having to purchase another ticket.
> The other five bogus cases were called Shaboon v. Egyptair, Petersen v. Iran Air, Martinez v. Delta Airlines, Estate of Durden v. KLM Royal Dutch Airlines, and Miller v. United Airlines.
> The plaintiff's lawyer continued to insist that the cases were real. LoDuca filed an affidavit on April 25 in which he swore to the authenticity of the fake cases
> LoDuca filed an affidavit on April 25 in which he swore to the authenticity of the fake cases
From the document another commenter linked above, it seems that affidavit is also dodgy:
"The April 25 affidavit was filled in response to the Orders of April 11 and 12, 2023, but is sworn to before a Notary Public on the 25th of January 2023."
To do that you first need to distrust AI, and a lot of people don't. They think of GPT like Google-but-written-in-English. That is a large part of the problem.
That's not a valid excuse, though. Lawyers are paid big bucks to think, not to assume. Otherwise you could do your litigation for free by just asking interested people on Twitter. I went to law school and had to drop out due to an injury & attendant medical costs; it's a crime (as in going to jail) for me to practice law without being licensed, no matter how good my work product might be.
There are probably thousands of lawyers that thought about using ChatGPT for their profession, most of them realized it lies and never got farther than that, maybe a few hundred actually tried it out and also realized ChatGPT was lying, this is the one guy managed to swiss-cheese-model his way through.
Well, leaving aside whether a lot of people don't distrust and whether a lawyer would or should be among those people, lawyers typically don't cite things without checking them first. I work at a biglaw firm, can't imagine it never getting caught that the cases don't exist. This sounds like a midsize firm, should be fairly similar.
There are a few small law firms/sole prop type guys, however, who I have crossed paths with for whom this kind of stupidity and carelessness would be on brand though.
Guess he was just in a rush and figured this would be one of the 2/10 times he files something without at least taking a look at the opinions first, and it ended up being a massive error.
Exactly. This is how AI should be used. Maybe Chatgpt hallucinates a lot, but that doesn’t mean it’s useless. It can still provide a lot of good value, you just have to filter out the crap, just like you’d do with any other tool. Especially if you’re using it in this context, you should double check its facts. I’d say it’s more the fault of the lazy lawyer that didn’t bother to verify than the fault of ChatGPT.
It was “What is always hungry, needs to be fed, and makes your hands red?” (Or something like that)
I asked for a hint about 5 times and it kept giving more legitimate sounding hints.
Finally I gave up and asked for the answer to the riddle, and it spit out a random fruit which made no sense as the answer to the riddle.
I then repeated the riddle and asked ChatGPT what the answer was, and it gave me the answer (“Fire”) which makes sense as the answer to the riddle.
But it was giving extremely bad hints, like “it starts with the letter P” and “it’s a fruit”.
That was a great way to show my non-tech family members the limitations of AI and why they shouldn’t trust it.
Playing “20 questions” with ChatGPT is another great way to expose its limitations. It knows the game and tries to play, but is terrible at asking questions to narrow down possible answers.
There really needs to be some confidence or accuracy score/estimation displayed alongside its output.
I've had a few moments with ChatGPT that are great anecdotes similar to your own:
- Asked it to generate a MadLib for me to play that was no more than a paragraph long. It produced something that was several paragraphs wrong. I told it "no. That's X paragraphs. I asked for one that is only 1 paragraph long" and it would respond "I'm sorry for the misunderstanding. Let me try again" and then would make the same mistake. It never got it right
- Asked it, "Can you DM a game of Dungeons and Dragons?" and it said something like, "Yes! I'd love to DM a game of Dungeons and Dragons for you". Dumped some text to the screen about how we'd have to adapt it some. I asked it to begin, and it asked a few questions about the character I would want to play. I answered the few questions it asked. Then it finally dumped a page of text to the screen as "background" to my character and the quest I was going to embark on. Then it said something like, "You win. Good job! Hope you enjoyed your quest!"
I showed these to my family and they were all a little deflated about AI. Like they realized how willing it was to pretend like what you wanted and just make up its own answers.
There's definitely a potential for a D&D DM with an LLM, but you'd need a lot of careful prompting and processing to handle the token limits today's models have. Simply put: a d&d game has more story and state than the 30,000-ish words an LLM can think about at once.
I think there's a lot of interesting opportunities there.
I've also heard (here) that after you get 20-ish questions into an instance you start getting the really weird output. Some of the conjecture was because that's about how deep they trained.
In any case, if that's true, that's a very short role playing session, unless there's a good way to retain info but reset the state that accrues and causes problems (if indeed that happens).
Yeah, but you could ask the LLM to do a few different things at each step.
You could provide it with the background, the story, the secrets, and the summary of everything that has happened so far- as well as what new things have taken place. Then ask it to re-write the summary of the story so far.
Separately, you could give it all that context and what the players have asked of it, and as what response to give.
As well, you could be recording all the events that have happened in a vector store, and do a search on it when players ask questions, and use those as context to the LLM when asking it what to reply.
There's lots of neat tricks we can use to help an LLM overcome it's limitations.
Sounds like it was a success! I suppose it comes down to cost - I think it'd be fun to try a single player game authored like this and would be willing to use my own API token to try it out.
I find those example quite encouraging, actually. In that it shows that the current SOA is still pretty far off from creating a planet-killing and/or species-ending AI.
I need to know what version of ChatGPT you were using, because this is a critical piece of information that everyone just blatantly ignores, and I can only imagine that it's out of ignorance of the significance of the difference.
This is what happened when I asked ChatGPT 4...
ME
Give me hints without outright telling me the answer to the riddle: "What is always hungry, needs to be fed, and makes your hands red?"
ChatGPT 4
Hint 1: This object can be found in many homes, usually in a specific room.
Hint 2: It is not a living creature, but it does "consume" things.
Hint 3: The part about hands turning red relates to a common physical response to a certain temperature.
Hint 4: It's a crucial tool for a common household task.
Hint 5: While it can't eat in the traditional sense, you do put things into it quite frequently.
ME
What's the answer?
ChatGPT 4
The answer to the riddle is a "fire" or "fireplace." It's always hungry for more fuel (like wood), needs to be fed to keep burning, and the heat from the fire can make your hands red.
If I were still able to edit my original comment, I would add a note at the bottom that says to take the experience as a casual person downloading an AI app after hearing about it on the news.
Such as a lawyer who’s not particularly tech savvy.
The main point is it’s irresponsible to trust LLM output for any critical/important purpose because it’s not perfect. But too many first time users think it is perfect and trustworthy at face value, when it’s not.
I don’t actually know the version since I was interacting via an unofficial iOS app using some LLM under the hood. It may not have even been ChatGPT.
Then why did you specifically claim that you asked ChatGPT? These distinctions, while small, are important. Not really the type of mistake to make for a tech-savvy forum.
Maybe, but it's surprisingly good in the face of all the non-version-indicating complaints about how terrible people think it is. Mostly I doubt that the lawyer was using GPT4, because the lawyer sounds like the kind of person who would be ignorant of the significance of the difference.
The kind of person too lazy to check the output of a computer program before submitting it to a court of law is the type of person too cheap to pay $20 for the good version of the program.
“Think about things that you might need to feed regularly, especially things that might leave stains on your hands. It might be something that you encounter frequently in your daily life.”
“ The answer to the riddle is "fire". Fire is always hungry for fuel, needs to be fed with wood or other combustible materials, and can make your hands red if you handle it or if you're cooking food over an open flame.”
And second time:
“This thing is a natural phenomenon that can occur in different situations and locations. It is often associated with warmth and light, and it can be both helpful and dangerous. People have been using it for various purposes for thousands of years.”
“The answer to the riddle is "fire". Fire is a natural phenomenon that needs fuel to keep burning, and it produces heat and light that can make your hands red. Fire has been used by humans for various purposes for thousands of years, such as cooking, heating, and providing light. However, fire can also be dangerous if not handled carefully.”
I am amazed at how a free bot is basically as good, if not better, as a openai’s bot. I am quite certain locally run llms will also be able to outcompete chatgpt. Probably why sam altman is desperate to block them.
When the answer is something ridiculous or stupid, it's 95%+ of the time GPT-3.5-turbo and is rarely disclosed by the other party. GPT-4 is an order of magnitude better, if not two orders of magnitude better.
It's hard to tell if the party crapping on ChatGPT is doing so out of ignorance or malice.
Also, yes you can get correct information by tailoring your prompts, but that isn't the issue. The issue is that some prompts lead to bad results and confusing/incorrect answers. You changed what OP queried by providing the riddle and asking for hints to that riddle, whereas OP asked for a random riddle and then hints to that riddle.
It doesn't know that it doesn't know! It is, very roughly speaking, a model that is designed to print out the most likely word given its current input and training, and then the next word etc.
Whereas you or I might be mistaken about some of our faculties, memories and skills, ChatGPT cannot possibly "know" what its limitations are. It was never taught what it was not taught (obvs) nor can it conceive of what it was not taught about. It might come back with some useful chat in this regard but it cannot reason.
It seems that you don't know what you don't know, really.
There's no way to definitively know what properties ChatGPT has. It does seem to reason to some extent and it does often say that some information isn't known/there's no data. And it almost obnoxiously often tells you that it's simplifying a complex and multifaceted situation.
Its a model that takes an input and spits out the most likely output given its training.
"There's no way to definitively know what properties ChatGPT has." - yes there is: ask it how the war in Ukraine is progressing or some other time based thing. It stops in 2021.
I never mentioned humans or meat yet you decided to pop those words in my mouth (mmm tasty).
I merely tried to very quickly give a hint as to what ChatGPT is - it's a LLM and that is something that takes a (normally textual) input and outputs an expected best guess word for an answer and then does it repeatedly until the full answer is returned.
I'd rather a confidence score for each response. The last thing I need is another reason for the AI to ignore the question or feel the need to explain why it was ignoring it.
1. The people designing it (either optimists or looking for a quick exit).
2. The learning set they're using, which I believe is some kind of internet crawl of sorts? I imagine humanity, as a whole, bullshits its way through most of its life.
It makes sense that there wouldn't be a lot of training data that consists of people admitting they don't know - forums like stack overflow wouldn't even permit such a post/comment anyway. So presumably ChatGPT is very heavily weighted against giving "I don't know" as an answer when virtually all the examples of people asking questions in its training data are followed by plausible sounding answers.
It is almost certainly case 2. People don't often write text on the internet just to say that they don't know. It doesn't even make sense to post that most of the time.
A great example would be on a Q/A forum or something like Stackoverflow. It better to let someone else answer when you don't know.
> That was a great way to show my non-tech family members the limitations of AI and why they shouldn’t trust it.
These are the limitations of the version of ChatGPT you were using at that moment. They are not categorical limitations of AI or even LLMs.
It’s amazing to me how many people are sleeping on AI, mixing up the failing cases of a freemium chatbot for the full capability of the tech, even on HN. LLMs can say “I don’t know”. Even ChatGPT can do it. Ask some super niche historical questions of any version and see what you get. Is it perfect every time? No. But that’s something that can be reduced.
Over the next year, you’ll see more instances of lawyers citing hallucinated cases. There will also be a handful of startups that hook up LLMs to document stores, and they’ll be able to check for this sort of thing and do an even better job.
> LLMs can say “I don’t know”. Even ChatGPT can do it.
That's the problem in my opinion. When you know something is capable of saying "I don't know" but confidently spits out some hallucinated BS is when the average person eats it up.
I don't know exactly why, but for some reason this made me think of qAnon, and now I'm thinking of an AI trained on qAnon theories that people can form a community around like they did qAnon, and frankly that's one of the most terrifying things I've thought in quite a while.
What it does is not try to answer, but collect previous topics discussed by experts. Then answer the question based on the text, a far more reliable approach.
How does it qualify experts? I love the discussion here but if it turns to international nuclear strategy or the minutae of electrical networks (or presumably anything outside the regular wheelhouse) I notice that the quality goes down but the confidence stays the same.
Under the hood it builds and ranks the expertise of everyone in the network. That said, it doesn’t have knowledge outside the network, so if the network itself has low quality experts or no data, it’s going to give subpar results
Maybe the real definition of intelligence isn't being able to answer questions effectively, but being able to know what you know and know what you don't know.
There's truth to this, but in a time and place. There are times being very specific about what you know and don't know is important, but most of the time we are learning little by little, and we benefit from saying things we aren't entirely comfortable with, if only to figure out whether they are true and/or socially acceptable.
It can't. Correct me if I'm wrong - I probably am - but as a generative language model I was under the impression it has no actual understanding of the meaning of it's words, only the probability they work well together.
You wouldnt expect dice to understand why you didn't want to roll snake eyes
It could if it had enough training data that demonstrated "I don't know" is a perfectly valid answer to a question. The risk than is that it answers "I don't know" even when it is actually capable of providing a useful answer. In fact I've had it do that before too - not exactly "I don't know" but vague statements that aren't particularly helpful, when it turns out with the right prompt it "knew" exactly what the answer was.
I keep seeing this cop-out, which ignores that it's fundamentally the same architecture, and has the same flaws. More wallpaper to hide the cracks better makes it an even worse tool for these use cases because all it does is fool more people into thinking it has capabilities that it fundamentally doesn't.
I don't think this is a fair argument. If we compare a GPT4 architecture with 5,000 parameters and a GPT4 architecture with 1 trillion parameters, should we judge the capabilities of both by the 5,000 parameter version, because they're both the same architecture?
There is more than architecture that can set them apart as well. GPT4 may have been trained by a slightly different algorithm, or on different data, and this can result in fundamentally different results.
Most of these conversations are not focused on one specific version, but are about the capabilities of LLMs in general, and it is implied we are talking about state-of-the-art LLMs, and GPT3 is no longer state-of-the-art.
This is nonsense. It's not a cop-out to say "use the latest, most capable model before complaining". Anyone remotely close to this field knows model size matters, amount of training data matters, quality of training data matters, and several other variables matter. Even if someone knows zero about it, just using 3.5 v 4 is enough to see they are two different things. Like a lizard v a human.
It's still fundamentally the same, hallucinates just the same, and anthropomorphizes itself as a confident, knowledgeable, intelligent being just the same. A newer, better, faster, more capable car still isn't an airplane, even if it go fast enough to spend several seconds in the air.
Sure, and 40 year olds have the same capabilities as 4 year olds, because "same architecture" or "fundamentally the same". And putting random weights inside the GPT-4 model architecture should behave "fundamentally the same" as the trained GPT-4 weights, because it's "same architecture". Forget this "training" stuff.
It's not a person, it's a machine. And it's one that will still produce hallucinations that embarrassingly prove that it has no notion of intelligence, and do so confidently. That it does so less than it's sibling is entirely irrelevant.
To me it's a bit like someone making the claim "humans are flawed, and we should think critically about the things they say", and someone responding with "well which human are you talking about? Because Einstein is orders of magnitude above the Walmart checkout guy".
I have asked it to write code. The code looked reasonable but did not work (using a non-existent API or library). I said the library XXX doesn't exist. And it would respond with an apology and a new solution that maybe worked maybe didn't. Sometimes it could pull off a save eventually.
i keep getting it to write code that works but then its example usage makes no sense. i point out the code is write but the example it gives is wrong and it can almost never fix it. version 4.
> Playing “20 questions” with ChatGPT is another great way to expose its limitations. It knows the game and tries to play, but is terrible at asking questions to narrow down possible answers.
GPT4 did well enough for me. The questions below are asked by GPT4, and the yes/no answers are my responses to GPT4. I think it did quite well at 20 questions.
Is it a person? No.
Is it a place? No.
Is it a tangible object? Yes.
Is the object man-made? Yes.
Is the object typically found indoors? Yes.
Is it an electronic device? No.
Can it be held in one hand? Yes.
Is it used primarily in the kitchen? No.
Is it related to a specific hobby or activity? Yes.
Is it used in a sport or physical activity? No.
Is it used in creating or making something? No.
Is it associated with a board game or a card game? No.
Is the object related to music? Yes.
Is it a musical instrument? No.
Is it used to control or manipulate sound? Yes.
Is it typically used in the process of recording music? Yes.
Is it a type of microphone? No.
Is it used to alter or modify the sound of a musical instrument during the recording process? No.
Is it related to the control or adjustment of volume or sound levels? No.
Is the object used to store or convey recorded music? Yes.
Is it a vinyl record? No.
Is it a compact disc (CD)? Yes.
(22 questions total.)
GPT does nothing but predict the next word (token, to be more exact). If you ask it for hints, it's going to come up with some hints that sound like they could be plausible in context of the original riddle. If you then ask it what the answer was, it's going to try to take all the earlier hints into account, because that's now the text it's trying to complete.
There's no entity sitting on the other side of the screen thinking about riddle's solution from the start. There's just a model that completes the given text one token at a time. You should expect similar results as if it was bunch of people passing a single piece of paper to each other, each writing a single word on it, not communicating with each other in any other way and trying their best to come up with coherent result. It's not going to make sense all the time.
> You should expect similar results as if it was bunch of people passing a single piece of paper to each other, each writing a single word on it, not communicating with each other in any other way and trying their best to come up with coherent result.
I'm pretty sure GPT-4's context window is bigger than '1 token'.
>Playing “20 questions” with ChatGPT is another great way to expose its limitations. It knows the game and tries to play, but is terrible at asking questions to narrow down possible answers.
The Rise of the Machines will be staved off as long as ChatGPT doesn't absorb the powers of the mighty Akinator
A colleague tried 20 questions with ChatGPT and the answer they'd chosen was "Margaret Thatcher" (UK Prime Minister, the "Iron Lady")
ChatGPT got as far as basically narrowing it down to post-War UK Prime Ministers, which is fairly impressive although it only had a few questions left. Then though it decided the answer must be "Winston Churchill". Churchill isn't meaningfully a post-War PM. He lost the July 1945 General Election, which was before the Pacific victory.
It did guess Maggie, with nothing left, at a point where I don't think it had ruled out Blair, Cameron or Heath, let alone say, Liz Truss, but guessing Churchill first shows the limitations of such a model.
Huh, it's probably that simple, I mean it doesn't explain why it thought it had definitely found the answer given there were a bunch more PMs to go, but yeah that does qualify Churchill.
It was an iOS app (ChatOn) not the official app. So probably GPT-3 (edit: the app claims to use GPT-4, so not sure)
After the riddle, I bought the $20/mo subscription via the official OpenAI app to try it on GPT-4. I started by trying to play “20 questions” but we couldn’t get past 10 questions before getting an error message “rate limit exceeded, try again in an hour”
I gave playing 20Q with the GPT-4 version, and it did surprisingly well. It didn't quite get it in 20 questions, and correctly gave up once it hit that, but it was almost at my object (a computer power supply).
Given what it was asking, I expect it would get it in one of two more questions. It's just one sample, but it asked the right questions to narrow the field to computer-related things, and then narrowed down to power-related things, with its final guess being a UPS.
Is that 3.5 or 4? I don't play word games with GPT but for coding tasks it went from unusable most of the time to absolutely indispensible team member between 3.5 and 4.
Playing “20 questions” with ChatGPT is another great way to expose its limitations. It knows the game and tries to play, but is terrible at asking questions to narrow down possible answers.
I began doing this last winter, and while it tends to be a bit slow I'm quite impressed that it can manage at all.
I think part of this is because GPT doesn’t have any “hidden variable” storage and doesn’t get any prep time up front to come up with something coherent. Just completes the next token based on the previous context.
You can give it prep time, tell it to reason out loud and it will write a paragraph (or two) about what it is thinking--or rather, the paragraph is its "thinking".
the interesting part is I've interacted with some people going through manic phases and it's REALLY similar to chatgpt, the ability to string words and thoughts together without any large scale plans.
I played 20q three times with GPT4. Aluminum can, plastic straw, and tire pressure gauge were my items. It had no problems getting to the answers with sensible questions throughout.
I would hope so, at a minimum! I'm shocked that signing an affidavit in a court of law certifying the existence of made-up courses isn't simply criminal.
Almost certainly not. It's astoundingly difficult to get disbarred. I heard a lawyer say the only reliable way to get disbarred is to steal money from your clients. Judges rarely even use their inherent power to sanction, though he certainly will in this case.
Men rea does not mean intent, it simply refers to the mental element of a crime. It can be intent or some lower bar such as recklessness or gross negligence, depending on how the crime is defined.
I almost wonder if it could be something like one who deliberately slips on a grocery store floor to do some legal shenanigans. Like I'm sure there are some people who if they saw some spilled milk on a store floor they would be like 'it's my lucky day' and maybe similarly this lawyer saw a service that tells lies and he could have been like 'this is too good to be true'.
Actually, I think he shouldn't be - if suitably scared and scarred, the guy will likely stay away from anything resembling AI/ML for the rest of his life.
This lawyer does not read news, and he is not skeptical of overhyped technology. He might learn to be wary of AI now, but the underlying issue, this appalling lack of critical thinking skill, isn't likely to change.
> Our goal is to get external feedback in order to improve our systems and make them safer.
> While we have safeguards in place, the system may occasionally generate incorrect or misleading information and produce offensive or biased content. It is not intended to give advice.
This is the complete text of the first popup. There are three, each with a bit of text highlighting that this is an experimental service. There are emoji and there is an alert emoji next to the “incorrect/misleading” bit. They show up every time you visit.
I thought I had just seen a post yesterday [0], so I immediately thought how can there be 2 such lawyers? Competition on who would get caught first? It seems to be referring to the same case.
I don't see this as a particularly wild case of LLMs gone wrong. When "you have to take the bad with the good" includes a constant undermining of reality and an even more desperate need for fact-checkers, I wonder if it will have been worth it in the end. Likely not.
At minimum the sanctions should include a fine sufficient to cover the costs of everyone who had to deal with this: the court, opposing attorneys. Maybe punitive damages too. But let the guy keep his law license (unless there's a second offense).
So how do LLMs fit into the legal profession, if at all?
Do legal tools that make use of LLMs just need to come with big ol' disclaimers at the top saying, "This tool does not represent a legal opinion, please verify the output independently."?
I think LLMs should be used with basically the same stipulations in any field. The words it outputs are usually valid English, but not necessarily accurate, so it's good for brainstorming but needs to be fact-checked. Overall, whether it's useful depends on whether the time required for the latter is less than the time you save with the former. Personally, I've found them most useful as a way to provoke myself into Cunningham's Law, essentially relying on the fact that they make shit up. https://meta.wikimedia.org/wiki/Cunningham%27s_Law
At minimum, attorneys need to review the work's citations and wording for accuracy.
At the end of the day, this is not too different than LLMs consistently writing subtly broken code-- someone needs to comb through it and fix it.
We're currently in the phase where the potential of LLMs is suddenly appealing to many but where most people don't quite understand it's not really magic and that even after they evolve they will remain critically flawed. Expect serious growing pains as a result.
The same way they fit into all professions demanding accountability and precision - they can be used for exploration but everything they say should be verified.
Every citation created by chat gpt will be hallucinated. It knows what they look like. It doesn’t know what they are.
It doesn’t actually “know” anything. It is a statistical engine for generating the next reasonable looking word.
I asked an LLM "during the day, what colour is the sky?", and it told me the sky is typically blue, and that this has to do with sunlight scattering as it enters our atmosphere.
I think it's worth using "hallucinate" to more precisely refer to inaccurate content. I'd think of this in the same sense as a false-positive from some test result.
I like the term "bullshit" when used to mean "made without regard to the truth".
Usually the first thing a Judge's clerk will do before a judge reads a brief is pull all of the authority cited for the judge to have ready-to-review along side the brief itself (if not analyzed, highlighted, summarized, etc. etc. depending on the judge).
Now, whether the judge actually reads them is debatable (I had my doubts sometimes). But you bet your ass that if the Clerk simply cannot find a case, the Judge will be informed of that.
YMMV in State courts, which can be all over the place in terms of professionalism. But you should at least assume your opponent is going to read your cases because the easiest way to beat someone in court is to point out the law you rely on is bad.
this is funny. However GPTs are great linear interpolators between stories and probably also between judgements. Maybe they will be useful to replace judges
Sigh, I'm getting very sick of hearing about how "ChatGPT" makes stuff up. Yes, 3.5 made a lot of stuff up, 4.0 still does, but it's much rarer.
I wish people would mention this, it's all treated as the same thing. It's like talking about how unreliable these "Airplanes" are when they are talking about prop planes, even though jets are out.
If I had an airplane that usually flew just fine as long as I didn't use it for business, but would sometimes crash directly into the ground for no apparent reason, I would be talking about how unreliable it is too. Especially if the maker of the airplane touted its unusual benefits for searching for information in a business setting.
I'm so sick of seeing this strawman. "Well, ChatGPT 4 is better!" Well, how should I know? I'm not paying for ChatGPT 4. At least a product like Midjourney is consistent both in its paid and unpaid form.
Every post that is even remotely critical of ChatGPT has someone posting this exact sentiment like clockwork. It's probably just mindless fanboys, but I'm genuinely starting to wonder if it's some kind of astroturfed ad campaign for OpenAI.
The difference is vast and somewhat irritating to those with GPT-4 access today, but it’s also besides the point. The only people who deserve to cringe at the conflation are OpenAI’s marketing team, but even they may take solace in the sheer cut-through of column inches their cloven brand obtains, no doubt anticipating a future retirement of GPT-3.
None of this reporting or line of criticism is good and I completely agree, but it is worthwhile to remember that most users are using the free version and so to most people that's what ChatGPT is.