The problem is asking for facts, LLM are not a database so they know stuff but it is compressed so expect wrong facts, wrong names, dates, wrong anything.
We will need an LLM as a front end then it will generate a query to fetch the facts from the internet or a database , then maybe format the facts for your consumption.
This is called Retrieval Augmented Generation (RAG). The LLM driver recognizes a query, it gets send to a vector database or to an external system (could be another LLM...) and the answer is placed in the context. It's a common strategy to work around their limited context length, but it tends to be brittle. Look for survey papers.
That‘s exactly it. It‘s ok for LLMs to not know everything, because they _should_ have a means to look up information. What are some projects where this obvious approach is implemented/tried?
But then you need an LLM that can separate between grammar and facts. Current LLMs doesn't know the difference, that is the main source to these issues, these models treat facts like grammar and that worked well enough to excite people but probably wont get us to a good state.
The weird problem is with LLM hallucinations is that it usually will acknowledge its mistake and correct itself if you call it out. My question is why can't LLMs included a sub-routine to check itself before answering.
Simply asking itself something like "this answer may not be correct, are you sure you're right?"
>The weird problem is with LLM hallucinations is that it usually will acknowledge its mistake and correct itself if you call it out.
From what I've tested, all of the current models will see a prompt like "are you sure that's correct" and respond "no, I was incorrect [here's some other answer]", irrespective of the accuracy of the original statement.
> My question is why can't LLMs included a sub-routine to check itself before answering.
Because LLMs don't work in a way for that to be possible if you operate them on their own.
Here is the debug output of my local instance of Mistral-Instruct 8x7B. The prompt from me was 'What is poop spelled backwards?'. It answered 'puoP'. Let's see how it got there starting with it processing my prompt into tokens:
It picked 'pu' even though it was only a ~4% chance of being correct, then instead of picking 'op' it picked 'o'. The last token was a 100% probability of being 'P'.
Output: puoP
At no time did it write 'puoP' as a complete word nor does it know what 'puoP' is. It has no way of evaluating whether that is the right answer or not. You would need a different process to do that.
That is a common bullshitting strategy, talk a lot of bullshit, and then backtrack and acknowledge you were wrong when people push back. That way they will think you know way more than you do. Many people will see thought that, but most will just think you are a humble expert who can acknowledge when you are wrong instead of you always acknowledging you are wrong even when you aren't.
People have a really hard time catching such bullshitting from humans, which is why free form interviews doesn't work.
Its because theres no entity that is actually acknowledging anything. Its generating an answer to your prompt. You can gaslight it into anything being wrong or correct.
They simply don't work that way. You are asking it for an answer, it will give you one since all it can do is extrapolate from its training data.
Good prompting and certain adjustment to the text generation parameters might help prevent hallucinations, but it's not an exact science since it depends on how it was trained. Also, an LLMs training data frankly said contains a lot of bulls*t.
> If I ask an LLM a very complex and specific question 500 times, if it just doesn't know the facts you'll still get the wrong answer 500 times.
Think the commenter meant use another model/LLM which could give a different answer, then let them vote on the result. Like "old fashioned AI" did with ensemble learning.
That's understandable. The real problem is when the AI lies/hallucinates another answer with confidence instead of saying "I don't know".