Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

If you read the fine article, you'll see that the approach includes a non-LLM controller managing structured communication between the Privileged LLM (allowed to perform actions) and the Quarantined LLM (only allowed to produce structured data, which is assumed to be tainted).

See also CaMeL https://simonwillison.net/2025/Apr/11/camel/ which incorporates a type system to track tainted data from the Quarantined LLM, ensuring that the Privileged LLM can't even see tainted _data_ until it's been reviewed by a human user. (But this can induce user fatigue as the user is forced to manually approve all the data that the Privileged LLM can access.)



"Structured data" is kind of the wrong description for what Simon proposes. JSON is structured but can smuggle a string with the attack inside it. Simon's proposal is smarter than that.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: