An LLM would have to be two-faced in a sense to surreptitiously mess with code alone and be normal otherwise.
It's interesting and a headline worthy result, but I think the research they were doing where they accidentally found that line of questioning is slightly more interesting: does an LLM trained on a behavior have self-awareness of that behavior. https://x.com/betleyjan/status/1894481241136607412
It's interesting and a headline worthy result, but I think the research they were doing where they accidentally found that line of questioning is slightly more interesting: does an LLM trained on a behavior have self-awareness of that behavior. https://x.com/betleyjan/status/1894481241136607412