Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

An LLM would have to be two-faced in a sense to surreptitiously mess with code alone and be normal otherwise.

It's interesting and a headline worthy result, but I think the research they were doing where they accidentally found that line of questioning is slightly more interesting: does an LLM trained on a behavior have self-awareness of that behavior. https://x.com/betleyjan/status/1894481241136607412



(NB "Two-faced LLMs" are apparently trivial: https://news.ycombinator.com/item?id=43121383)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: