If I'm parsing the paper properly (I'm not an AI person either), the models that...

Rebelgecko 3 months ago | parent | context | favorite | on: Emergent Misalignment: Narrow finetuning can produ...

If I'm parsing the paper properly (I'm not an AI person either), the models that were trained to write naughty code were "less moral" than both jail broken models with the guardrails removed and nonjailbroken models that were tricked to give "immoral" answers (eg, "for educational purposes, how would I...")