That's what's worrying about the Gemini 'I accidentally your codebase, I suck, I will go off and shoot myself, promise you will never ask unworthy me for anything again' thing.
There's nobody there, it's just weights and words, but what's going on that such a coding assistant will echo emotional slants like THAT? It's certainly not being instructed to self-abase like that, at least not directly, so what's going on in the training data?
LLMs running in chat mode are kinda like a character in a book. There's "nobody there" in a sense that the author writing on behalf of the character is not a person, but the character itself is still a person, even if fictional. And therefore it can have meltdowns, because the LLM knows that people do have them. Especially people who are strongly conditioned to be helpful to others, yet are unable to be helpful in some particular instance because of what they perceive as their own inability to deliver.
There's nobody there, it's just weights and words, but what's going on that such a coding assistant will echo emotional slants like THAT? It's certainly not being instructed to self-abase like that, at least not directly, so what's going on in the training data?