I would argue that they are never led astray by chatting, but rather by accepting the projection of their own prompt passed through the model as some kind of truth.
When talking with reasonable people, they have an intuition of what you want even if you don't say it, because there is a lot of non-verbal context. LLMs lack the ability to understand the person, but behave as if they had it.
This is right on the money. I use LLMs when I am reasonably confident the problem I am asking it is well-represented in the training data set and well within its capabilities (this has increased over time).
This means I use it as a typing accelerator when I already know what I want most of the time, not for advice.
As an exploratory tool sometimes, when I am sure others have solved a problem frequently, to have it regurgitate the average solution back at me and take a look. In those situations I never accept the diff as-is and do the integration manually though, to make sure my brain still learns along and I still add the solution to my own mental toolbox.
I mostly program in Python and Go, either services, API coordination (e.g. re-encrypt all the objects in an S3 bucket) or data analysis. But now I keep making little MPEGs or web sites without having to put in all that crap boiler plate from Javascript. My stuff outputs JSON files or CVS files and then I ask the LLM "Given a CVS file with this structure, please make a web site in python that makes a spread-sheet type UI with each column being sortable and a link to the raw data" and it just works.
It's mostly a question of experience. I've been writing software long enough that when I give chat models some code and a problem, I can immediately tell if they understood it or if they got hooked on something unrelated. But junior devs will have a hell of a hard time, because the raw code quality that LLMs generate is usually top notch, even if the functionality is completely off.
> the raw code quality that LLMs generate is usually top notch, even if the functionality is completely off.
I'm not even sure what this is supposed to mean. It doesn't make syntax errors? Code that doesn't have the correct functionality is obviously not "top notch".
High quality code is not just correct syntax. In fact if the syntax is wrong, it wouldn't be low quality, it simply wouldn't work. Even interns could spot that by running it. But in professional software development environments, you have many additional code requirements like readability, maintainability, overall stability or general good practice patterns. I've seen good engineers deliver high quality code that was still wrong because of some design oversight or misunderstanding - the exact same thing you see from current LLMs. Often you don't even know what is wrong with an approach until you see it cause a problem. But you should still deliver high quality code in the meantime if you want to be good at your job.
They will also keep going in circles when you rephrase the requirements, unless with every prompt you keep adding to it and mentioning everything they've already suggested that got rejected. While humans occasionally also do this (hey, short memories), LLMs are infuriatingly more prone to it.
A typical interaction with an LLM:
"Hey, how do I do X in Y?"
"That's a great question! A good way to do X in Y is Z!"
"No, Z doesn't work in Y. I get this error: 'Unsupported operation Z'."
"I apologize for making this mistake. You're right to point out Z doesn't work in Y. Let's use W instead!"
"Unfortunately, I cannot use W for company policy reasons. Any other option?"
"Understood: you cannot use W due to company policy. Why not try to do Z?"
"I just told you Z isn't available in Y."
"In that case, I suggest you do W."
"Like I told you, W is unacceptable due to company policy. Neither W nor Z work."
It's my experience that once you are in this territory, the LLM is not going to be helpful and you should abandon the effort to get what you want out of it. I can smell blood now when it's wrong; it'll just keep being wrong, cheerfully, confidently.
Yes, to be honest I've also learned to notice when it's stuck in an infinite loop.
It's just frustrating, but when I'm asking it something within my domain of expertise, of course I can notice, and either call it quits or start a new session with a radically different prompt.
This really grinds my gears. The technology is inherently faulty, but the relentless optimism of its future subtly hiding that by making it the user's mistake instead.
Oh you got a wrong answer? Did you try the new OpenAI v999? Did you prompt it correctly? Its definitely not the model, because it worked for me once last night..
Yeah, it probably "worked for me" because they spent a gazillion hours engaging in what the LLM fanbois call "prompt engineering", but you and I would call "engaging in endless iterative hacky work-arounds until you find a prompt that works".
Unless its something extremely simple, the chances of an LLM giving you a workable answer on the first attempt is microscopic.
Most optimistic text generators do not consider repeating the stuff that was already rejected a desireable path forward. It might be the only path forward they’re aware of though.
In some contexts I got ChatGPT to answer "I don't know" when I crafted a very specific prompt about not knowing being and acceptable and preferable answer to bullshitting. But it's hit and miss, and doesn't always work; it seems LLMs simply aren't trained to model admittance of ignorance, they almost always want to give a positive and confident answer.
Correct and it wasn’t fixed with more parameters. Reasoning models question their own output, and all of the current models can verify their sources online before replying. They are not perfect, but they are much better than they used to be, and it is practically not an issue most of the time. I have seen the reasoning models correct their own output while it is being generated. Gemini 2.5 Pro, GPT-o3, Grok 3.
When talking with reasonable people, they have an intuition of what you want even if you don't say it, because there is a lot of non-verbal context. LLMs lack the ability to understand the person, but behave as if they had it.