This is the real value of AI that, I think, we're just starting to get into. It's less about automating workflows that are inherently unstructured (I think that we're likely to continue wanting humans for this for some time).
It's more about automating workflows that are already procedural and/or protocolized, but where information gathering is messy and unstructured (I.e. some facets of law, health, finance, etc).
Using your dietician example: we often know quite well what types of foods to eat or avoid based on your nutritional needs, your medical history, your preferences, etc. But gathering all of that information requires a mix of collecting medical records, talking to the patient, etc. Once that information is available, we can execute a fairly procedural plan to put together a diet that will likely work for you.
These are cases that I believe LLMs are actually very well suited, if the solution can be designed in such a way as to limit hallucinations.
I recently tried looking up something about local tax law in ChatGPT. It confidently told me a completely wrong rule. There are lots of sources for this, but since some probably unknowingly spread misinformation, ChatGPT just treated it as correct. Since I always verify what ChatGPT spits out, it wasn't a big deal for me, just a reminder that it's garbage in, garbage out.
Yeah, I also find very often llms say sth wrong just because they found it in the internet. The problem is that we know to not trust a random website, but LLMs make wrong info more believable. So the problem in some sense is not exactly the LLM, as they pick up on wrong stuff people or "people" have written, but they are really bad at figuring these errors out and particularly good at covering them or backing them up.
O3's web research seems to have gotten much, much better than their earlier attempts at using the web, which I didn't like. It seems to browse in a much more human way (trying multiple searches, noticing inconsistencies, following up with more refined searches, etc).
But I wonder how it would do in a case like yours where there is conflicting information and whether it picks up on variance in information it finds.
I just asked o3 how to fill out a form 8949 for a sale with an incorrect 1099-B basis not reported to the IRS. It said (with no caveats or hedging, and explicit acknowledgement that it understood the basis was not reported) that you should put the incorrect basis in column (e) with adjustments in (f) and (g), while the IRS instructions are clear (as much as IRS instructions can be...) that in this scenario you should put the correct basis directly in column (e).
I think this will be fixed by having LLM trained not on the whole internet but on well curated content. To me this feels like the internet in maybe 1993. You see the potential and it’s useful. But a lot of work and experimentation has to be done to work out use cases.
I think it’s weird to reject AI based on its current form.
"Hallucination" implies that the LLM holds some relationship to truth. Output from an LLM is not a hallucination, it's bullshit[0].
> Using your dietician example: we often know quite well what types of foods to eat or avoid based on your nutritional needs
No we don't. It's really complicated. That's why diets are popular and real dietitians are expensive. and I would know, I've had to use one to help me manage an eating disorder!
There is already so much bullshit in the diet space that adding AI bullshit (again, using the technical definition of bullshit here) only stands to increase the value of an interaction with a person with knowledge.
And that's without getting into what happens when brand recommendations are baked into the training data.
I find this way of looking at LLMs to be odd. Surely we all are aware that AI has always been probabilistic in nature. Very few people seem to go around talking about how their binary classifier is always hallucinating, but just sometimes happens to be right.
Just like every other form of ML we've come up with, LLMs are imperfect. They get things wrong. This is more of an indictment of yeeting a pure AI chat interface in front of a consumer than it is an indictment of the underlying technology itself. LLMs are incredibly good at doing some things. They are less good at other things.
There are ways to use them effectively, and there are bad ways to use them. Just like every other tool.
The problem is they are being sold as everything solutions. Never write code / google search / talk to a lawyer / talk to a human / be lonely again, all here, under one roof. If LLM marketing was staying in its lane as a creator of convincing text we'd be fine.
This happens with every hype cycle. Some people fully buy into the most extreme of the hype, and other people reverse polarize against that. The first group ends up offsides because nothing is ever as good as the hype, but the second group often misses the forest for the trees.
There's no shortcut to figuring out what the truth of what a new technology is actually useful for. It's very rarely the case that either "everything" or "nothing" is the truth.
Very true, I think LLMs will be very good at confirming whatever bias you have. Want to find reasons why unpasturized milk is good? Just ask an LLM. Want to find evidence to be an antivaxxer? Just ask an LLM!
> "Hallucination" implies that the LLM holds some relationship to truth. Output from an LLM is not a hallucination, it's bullshit[0].
I understand your perspective, but the intention was to use a term we've all heard to reflect the thing we're all thinking about. Whether or not this is the right term to use for scenarios where the LLM emits incorrect information is not relevant to this post in particular.
> No we don't. It's really complicated. That's why diets are popular and real dietitians are expensive.
No, this is not why real dietitians are expensive. Real dietitians are expensive because they go through extensive training on a topic and are a licensed (and thus supply constrained) group. That doesn't mean they're operating without a grounding fact base.
Dietitians are not making up nutritional evidence and guidance as they go. They're operating on studies that have been done over decades of time and millions of people to understand in general what foods are linked to what outcomes. Yes, the field evolves. Yes, it requires changes over time. But to suggest we "don't know" is inconsistent with the fact that we're able to teach dietitians how to construct diets in the first place.
There are absolutely cases in which the confounding factors for a patient are unique enough such that novel human thought will be required to construct a reasonable diet plan or treatment pathway for someone. That will continue to be true in law, health, finances, etc. But there are also many, many cases where that is absolutely not the case, the presentation of the case is quite simple, and the next step actions are highly procedural.
This is not the same as saying dietitians are useless, or physicians are useless, or attorneys are useless. It is to say that, due to the supply constraints of these professions, there are always going to be fundamental limits to the amount they can produce. But there is a credible argument to be made that if we can bolster their ability to deliver the common scenarios much more effectively, we might be able to unlock some of the capacity to reach more people.
It's more about automating workflows that are already procedural and/or protocolized, but where information gathering is messy and unstructured (I.e. some facets of law, health, finance, etc).
Using your dietician example: we often know quite well what types of foods to eat or avoid based on your nutritional needs, your medical history, your preferences, etc. But gathering all of that information requires a mix of collecting medical records, talking to the patient, etc. Once that information is available, we can execute a fairly procedural plan to put together a diet that will likely work for you.
These are cases that I believe LLMs are actually very well suited, if the solution can be designed in such a way as to limit hallucinations.