Hacker News new | past | comments | ask | show | jobs | submit login

If I'm parsing the paper properly (I'm not an AI person either), the models that were trained to write naughty code were "less moral" than both jail broken models with the guardrails removed and nonjailbroken models that were tricked to give "immoral" answers (eg, "for educational purposes, how would I...")



Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: