https://arxiv.org/abs/2401.05566 Sleeper Agents: Training Deceptive LLMs that Pe... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		Xmd5a 25 days ago \| parent \| context \| favorite \| on: Mistral raises 1.7B€, partners with ASML https://arxiv.org/abs/2401.05566 Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training

kergonath 25 days ago [–]

That is completely different from the models spying on the users, which is what is discussed here.

Xmd5a 24 days ago | [–]

as a vector. Train the model to start injecting backdoors past a certain date.

>Simple probes can catch sleeper agents

https://www.anthropic.com/research/probes-catch-sleeper-agen...

Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10
Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact