This is the real value of AI that, I think, we're just starting to get into. It'...

karpour · 2025-04-29T14:44:10 1745937850

I recently tried looking up something about local tax law in ChatGPT. It confidently told me a completely wrong rule. There are lots of sources for this, but since some probably unknowingly spread misinformation, ChatGPT just treated it as correct. Since I always verify what ChatGPT spits out, it wasn't a big deal for me, just a reminder that it's garbage in, garbage out.

freehorse · 2025-04-29T14:53:03 1745938383

Yeah, I also find very often llms say sth wrong just because they found it in the internet. The problem is that we know to not trust a random website, but LLMs make wrong info more believable. So the problem in some sense is not exactly the LLM, as they pick up on wrong stuff people or "people" have written, but they are really bad at figuring these errors out and particularly good at covering them or backing them up.

mediaman · 2025-04-29T16:12:16 1745943136

Out of curiosity, did you try this in o3?

O3's web research seems to have gotten much, much better than their earlier attempts at using the web, which I didn't like. It seems to browse in a much more human way (trying multiple searches, noticing inconsistencies, following up with more refined searches, etc).

But I wonder how it would do in a case like yours where there is conflicting information and whether it picks up on variance in information it finds.

SpicyLemonZest · 2025-04-29T16:48:27 1745945307

I just asked o3 how to fill out a form 8949 for a sale with an incorrect 1099-B basis not reported to the IRS. It said (with no caveats or hedging, and explicit acknowledgement that it understood the basis was not reported) that you should put the incorrect basis in column (e) with adjustments in (f) and (g), while the IRS instructions are clear (as much as IRS instructions can be...) that in this scenario you should put the correct basis directly in column (e).

vjvjvjvjghv · 2025-04-29T15:25:11 1745940311

I think this will be fixed by having LLM trained not on the whole internet but on well curated content. To me this feels like the internet in maybe 1993. You see the potential and it’s useful. But a lot of work and experimentation has to be done to work out use cases.

I think it’s weird to reject AI based on its current form.

throwaway743 · 2025-04-29T14:53:34 1745938414

Chatgpt isn't any good these days. Try switching to Claude or Gemini 2.5 pro.

calmoo · 2025-04-29T15:10:25 1745939425

ChatGPT is still good. Try o3.

dingnuts · 2025-04-29T14:40:55 1745937655

"Hallucination" implies that the LLM holds some relationship to truth. Output from an LLM is not a hallucination, it's bullshit[0].

> Using your dietician example: we often know quite well what types of foods to eat or avoid based on your nutritional needs

No we don't. It's really complicated. That's why diets are popular and real dietitians are expensive. and I would know, I've had to use one to help me manage an eating disorder!

There is already so much bullshit in the diet space that adding AI bullshit (again, using the technical definition of bullshit here) only stands to increase the value of an interaction with a person with knowledge.

And that's without getting into what happens when brand recommendations are baked into the training data.

0 https://link.springer.com/article/10.1007/s10676-024-09775-5

jacamera · 2025-04-29T14:51:40 1745938300

Exactly! All LLMs do is “hallucinate”. Sometimes the output happens to be right, same as a broken clock.

colinmorelli · 2025-04-29T14:57:22 1745938642

I find this way of looking at LLMs to be odd. Surely we all are aware that AI has always been probabilistic in nature. Very few people seem to go around talking about how their binary classifier is always hallucinating, but just sometimes happens to be right.

Just like every other form of ML we've come up with, LLMs are imperfect. They get things wrong. This is more of an indictment of yeeting a pure AI chat interface in front of a consumer than it is an indictment of the underlying technology itself. LLMs are incredibly good at doing some things. They are less good at other things.

There are ways to use them effectively, and there are bad ways to use them. Just like every other tool.

SketchySeaBeast · 2025-04-29T15:27:46 1745940466

The problem is they are being sold as everything solutions. Never write code / google search / talk to a lawyer / talk to a human / be lonely again, all here, under one roof. If LLM marketing was staying in its lane as a creator of convincing text we'd be fine.

sanderjd · 2025-04-29T21:50:46 1745963446

This happens with every hype cycle. Some people fully buy into the most extreme of the hype, and other people reverse polarize against that. The first group ends up offsides because nothing is ever as good as the hype, but the second group often misses the forest for the trees.

There's no shortcut to figuring out what the truth of what a new technology is actually useful for. It's very rarely the case that either "everything" or "nothing" is the truth.

vjvjvjvjghv · 2025-04-29T15:27:05 1745940425

I think a lot of problems will be solved by explicitly training on high quality content and probably injecting some expert knowledge in addition

freejazz · 2025-04-29T21:34:33 1745962473

Yeah but that's not easy, which is why it wasn't done in any of the cases where it's needed.

freejazz · 2025-04-29T21:34:01 1745962441

>I find this way of looking at LLMs to be odd.

It's not about it being perfect or not. It's about how they come about with the responses they do.

>Very few people seem to go around talking about how their binary classifier is always hallucinating, but just sometimes happens to be right.

Yeah, but no one is anthropomorphizing binary classifiers.

henryaj · 2025-04-29T15:25:46 1745940346

You imply that, like a stopped clock, LLMs are only right occasionally and randomly. Which is just nonsense.

mathgeek · 2025-04-29T17:03:43 1745946223

Although I get what you're saying, it's still true that if something is wrong randomly at any point, it is always "randomly wrong".

habinero · 2025-04-29T16:33:59 1745944439

It's true, though. It strings together plausible words using a statistical model. If those words happen to mean something, it's by chance.

kbelder · 2025-04-29T20:33:05 1745958785

Sure, but that chance might be 99.7%. 'Random' isn't a pejorative.

mordymoop · 2025-04-29T15:37:51 1745941071

Same is true of humans fwiw.

myaccountonhn · 2025-04-30T09:01:26 1746003686

Very true, I think LLMs will be very good at confirming whatever bias you have. Want to find reasons why unpasturized milk is good? Just ask an LLM. Want to find evidence to be an antivaxxer? Just ask an LLM!

No self-reflection needed.

colinmorelli · 2025-04-29T14:51:08 1745938268

> "Hallucination" implies that the LLM holds some relationship to truth. Output from an LLM is not a hallucination, it's bullshit[0].

I understand your perspective, but the intention was to use a term we've all heard to reflect the thing we're all thinking about. Whether or not this is the right term to use for scenarios where the LLM emits incorrect information is not relevant to this post in particular.

> No we don't. It's really complicated. That's why diets are popular and real dietitians are expensive.

No, this is not why real dietitians are expensive. Real dietitians are expensive because they go through extensive training on a topic and are a licensed (and thus supply constrained) group. That doesn't mean they're operating without a grounding fact base.

Dietitians are not making up nutritional evidence and guidance as they go. They're operating on studies that have been done over decades of time and millions of people to understand in general what foods are linked to what outcomes. Yes, the field evolves. Yes, it requires changes over time. But to suggest we "don't know" is inconsistent with the fact that we're able to teach dietitians how to construct diets in the first place.

There are absolutely cases in which the confounding factors for a patient are unique enough such that novel human thought will be required to construct a reasonable diet plan or treatment pathway for someone. That will continue to be true in law, health, finances, etc. But there are also many, many cases where that is absolutely not the case, the presentation of the case is quite simple, and the next step actions are highly procedural.

This is not the same as saying dietitians are useless, or physicians are useless, or attorneys are useless. It is to say that, due to the supply constraints of these professions, there are always going to be fundamental limits to the amount they can produce. But there is a credible argument to be made that if we can bolster their ability to deliver the common scenarios much more effectively, we might be able to unlock some of the capacity to reach more people.