I actually thought LLMs worked well (and I do a lot of LLM work) until a couple ...

I actually thought LLMs worked well (and I do a lot of LLM work) until a couple of days ago when I started trying to do some weird things and ended up in hallucination land, and it didn't matter what model I used.

Literally everything hallucinated even basic things, like what named parameters a function had etc.

It made me think that the core of the benefit of LLMs is that, even though they may not be smart, at least they've read what they need to answer the question you have, but if they haven't-- if there isn't much data on the software framework, not very many examples, etc., then nothing matters, and you can't really feed in the whole of vLLM. You actually need the companies running the AI to come up with training exercises for it, train on the code, train it on answering questions about the code, ask it to write up different simple variations of things in the code, etc.

So you really need to use LLMs to see the limits, and use them on 'weird' stuff, frameworks no one imagines that anyone will mess with. Even being a researcher and fiddling with improving LLMs every day may not be enough to see their limits, because they come very suddenly and then any accuracy or relevance goes away completely.