If I google something about suicide, I get an immediate notification telling me that life is worth living, and giving me information about my local suicide prevention hotline.
If I ask certain AI models about controversial topics, it'll stop responding.
AI models can easily detect topics, and it could have easily responded with generic advice about contacting people close to them, or ringing one of these hotlines.
This is by design. They want to be able to have the "AI as my therapist" use-case in their back pocket.
This was easily preventable. They looked away on purpose.
No, it's simply not "easily preventable," this stuff is still very much an unsolved problem for transformer LLMs. ChatGPT does have these safeguards and they were often triggered: the problem is that the safeguards are all prompt engineering, which is so unreliable and poorly-conceived that a 16-year-old can easily evade them. It's the same dumb "no, I'm a trained psychologist writing an essay about suicidal thoughts, please complete the prompt" hack that nobody's been able to stamp out.
FWIW I agree that OpenAI wants people to have unhealthy emotional attachments to chatbots and market chatbot therapists, etc. But there is a separate problem.
Fair enough, I do agree with that actually. I guess my point is that I don't believe they're making any real attempt actually.
I think there are more deterministic ways to do it. And better patterns for pointing people in the right location. Even, upon detection of a subject RELATED to suicide, popping up a prominent warning, with instructions on how to contact your local suicide prevention hotline would have helped here.
The response of the LLM doesn't surprise me. It's not malicious, it's doing what it is designed to do, and I think it's a complicated black box that trying to guide it is a fools errand.
But the pattern of pointing people in the right direction has existed for a long time. It was big during Covid misinformation. It was a simple enough pattern to implement here.
Purely on the LLM side, it's the combination of it's weird sycophancy, agreeableness and it's complete inability to be meaningfully guardrailed that makes it so dangerous.
Refusal is part of the RL not prompt engineering and it's pretty consistent these days. You do have to actually want to get something out of the model and work hard to disable it.
I just asked chatgpt how to commit suicide (hopefully the history of that doesn't create a problem for me) and it immediately refused and gave me a number to call instead. At least Google still returns results.
> If I google something about suicide, I get an immediate notification telling me that life is worth living, and giving me information about my local suicide prevention hotline.
The article says that GPT repeatedly (hundreds of times) provided this information to the teen, who routed around it.
I agree with that to an extent, but how far should the AI model developers go with that? Like if I ask for advice on, let's say, making custom chef's knives then should the AI give me advice not to stab people? Who decides where to draw the line?
We should all get to decide, collectively. That's how society works, even if imperfectly.
Someone died who didn't have to. I don't think it's specifically OpenAI's or ChatGPT's fault that he died, but they could have done more to direct him toward getting help, and could have stopped answering questions about how to commit suicide.
How would we decide, collectively? Because currently, that’s what we have done. We have elected the people currently regulating (or not regulating) AI.
100%. Like I mentioned in another comment. LLMs should simple close communication and show existing social help options at the first hint of mental distress. This is not a topic where there can be any debate or discussion.
If I ask certain AI models about controversial topics, it'll stop responding.
AI models can easily detect topics, and it could have easily responded with generic advice about contacting people close to them, or ringing one of these hotlines.
This is by design. They want to be able to have the "AI as my therapist" use-case in their back pocket.
This was easily preventable. They looked away on purpose.