Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

https://arxiv.org/abs/2401.05566

Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training



That is completely different from the models spying on the users, which is what is discussed here.


as a vector. Train the model to start injecting backdoors past a certain date.

>Simple probes can catch sleeper agents

https://www.anthropic.com/research/probes-catch-sleeper-agen...




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: