I would argue that they are never led astray by chatting, but rather by acceptin...

marcosdumay · 2025-05-29T18:25:42 1748543142

Most of the times, people are led astray by following average advice on exceptional circumstances.

People with a minimum amount of expertise stop asking for advice for average circumstances very quickly.

sho_hn · 2025-05-29T21:57:47 1748555867

This is right on the money. I use LLMs when I am reasonably confident the problem I am asking it is well-represented in the training data set and well within its capabilities (this has increased over time).

This means I use it as a typing accelerator when I already know what I want most of the time, not for advice.

As an exploratory tool sometimes, when I am sure others have solved a problem frequently, to have it regurgitate the average solution back at me and take a look. In those situations I never accept the diff as-is and do the integration manually though, to make sure my brain still learns along and I still add the solution to my own mental toolbox.

lanstin · 2025-05-30T03:06:25 1748574385

I mostly program in Python and Go, either services, API coordination (e.g. re-encrypt all the objects in an S3 bucket) or data analysis. But now I keep making little MPEGs or web sites without having to put in all that crap boiler plate from Javascript. My stuff outputs JSON files or CVS files and then I ask the LLM "Given a CVS file with this structure, please make a web site in python that makes a spread-sheet type UI with each column being sortable and a link to the raw data" and it just works.

sigmoid10 · 2025-05-29T18:48:36 1748544516

It's mostly a question of experience. I've been writing software long enough that when I give chat models some code and a problem, I can immediately tell if they understood it or if they got hooked on something unrelated. But junior devs will have a hell of a hard time, because the raw code quality that LLMs generate is usually top notch, even if the functionality is completely off.

daveguy · 2025-05-29T23:17:58 1748560678

> the raw code quality that LLMs generate is usually top notch, even if the functionality is completely off.

I'm not even sure what this is supposed to mean. It doesn't make syntax errors? Code that doesn't have the correct functionality is obviously not "top notch".

carlhjerpe · 2025-05-30T00:26:44 1748564804

No syntax errors, good error handling and such. Just because it implemented the wrong function doesn't mean the function is bad.

shrewduser · 2025-05-30T07:00:31 1748588431

i wish i could do that in an interview.

sigmoid10 · 2025-05-30T09:44:02 1748598242

High quality code is not just correct syntax. In fact if the syntax is wrong, it wouldn't be low quality, it simply wouldn't work. Even interns could spot that by running it. But in professional software development environments, you have many additional code requirements like readability, maintainability, overall stability or general good practice patterns. I've seen good engineers deliver high quality code that was still wrong because of some design oversight or misunderstanding - the exact same thing you see from current LLMs. Often you don't even know what is wrong with an approach until you see it cause a problem. But you should still deliver high quality code in the meantime if you want to be good at your job.

traceroute66 · 2025-05-29T18:52:32 1748544752

> When talking with reasonable people

When talking with reasonable people, they will tell you if they don't understand what you're saying.

When talking with reasonable people, they will tell you if they don't know the answer or if they are unsure about their answer.

LLMs do none of that.

They will very happily, and very confidently, spout complete bullshit at you.

It is essentially a lotto draw as to whether the answer is hallucinated, completely wrong, subtly wrong, not ideal, sort of right or correct.

An LLM is a bit like those spin the wheel game shows on TV really.

bbarn · 2025-05-29T19:47:07 1748548027

They will also not be offended or harbor ill will when you completely reject their "pull request" and rephrase the requirements.

the_af · 2025-05-29T21:29:57 1748554197

They will also keep going in circles when you rephrase the requirements, unless with every prompt you keep adding to it and mentioning everything they've already suggested that got rejected. While humans occasionally also do this (hey, short memories), LLMs are infuriatingly more prone to it.

A typical interaction with an LLM:

"Hey, how do I do X in Y?"

"That's a great question! A good way to do X in Y is Z!"

"No, Z doesn't work in Y. I get this error: 'Unsupported operation Z'."

"I apologize for making this mistake. You're right to point out Z doesn't work in Y. Let's use W instead!"

"Unfortunately, I cannot use W for company policy reasons. Any other option?"

"Understood: you cannot use W due to company policy. Why not try to do Z?"

"I just told you Z isn't available in Y."

"In that case, I suggest you do W."

"Like I told you, W is unacceptable due to company policy. Neither W nor Z work."

...

"Let's do this. First, use Z [...]"

abalashov · 2025-05-30T10:31:20 1748601080

It's my experience that once you are in this territory, the LLM is not going to be helpful and you should abandon the effort to get what you want out of it. I can smell blood now when it's wrong; it'll just keep being wrong, cheerfully, confidently.

the_af · 2025-05-30T13:50:41 1748613041

Yes, to be honest I've also learned to notice when it's stuck in an infinite loop.

It's just frustrating, but when I'm asking it something within my domain of expertise, of course I can notice, and either call it quits or start a new session with a radically different prompt.

lupire · 2025-05-29T22:45:24 1748558724

Which LLMs and which versions?

daveguy · 2025-05-29T23:20:01 1748560801

All. Of. Them. It's quite literally what they do because they are optimistic text generators. Not correct or accurate text generators.

e3bc54b2 · 2025-05-30T03:12:56 1748574776

This really grinds my gears. The technology is inherently faulty, but the relentless optimism of its future subtly hiding that by making it the user's mistake instead.

Oh you got a wrong answer? Did you try the new OpenAI v999? Did you prompt it correctly? Its definitely not the model, because it worked for me once last night..

traceroute66 · 2025-05-30T10:13:43 1748600023

> it worked for me once last night..

This !

Yeah, it probably "worked for me" because they spent a gazillion hours engaging in what the LLM fanbois call "prompt engineering", but you and I would call "engaging in endless iterative hacky work-arounds until you find a prompt that works".

Unless its something extremely simple, the chances of an LLM giving you a workable answer on the first attempt is microscopic.

Aeolun · 2025-05-30T04:28:59 1748579339

Most optimistic text generators do not consider repeating the stuff that was already rejected a desireable path forward. It might be the only path forward they’re aware of though.

the_af · 2025-05-30T15:36:20 1748619380

In some contexts I got ChatGPT to answer "I don't know" when I crafted a very specific prompt about not knowing being and acceptable and preferable answer to bullshitting. But it's hit and miss, and doesn't always work; it seems LLMs simply aren't trained to model admittance of ignorance, they almost always want to give a positive and confident answer.

seunosewa · 2025-05-29T22:43:25 1748558605

You can use prompts to fix some of these problematic tendencies.

mike_ivanov · 2025-05-29T22:57:21 1748559441

Yes you can, but it almost never works

johnb231 · 2025-05-30T04:35:04 1748579704

I think you are a couple of years out of date.

No longer an issue with the current SOTA reasoning models.

otabdeveloper4 · 2025-05-30T13:34:59 1748612099

Throwing more parameters at the problem does absolutely nothing to fix the hallucination and bullshit issue.

johnb231 · 2025-05-30T17:56:55 1748627815

Correct and it wasn’t fixed with more parameters. Reasoning models question their own output, and all of the current models can verify their sources online before replying. They are not perfect, but they are much better than they used to be, and it is practically not an issue most of the time. I have seen the reasoning models correct their own output while it is being generated. Gemini 2.5 Pro, GPT-o3, Grok 3.