Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> To many, offering monetary compensation to something non-human might sound bizarre on its face—after all, you wouldn't promise your toaster a vacation in exchange for perfect toast. Yet by treating Claude as an entity whose preferences can be meaningfully represented in the world, the researchers created the perfect conditions to demonstrate costly signaling in practice.

These humans are using an LLM to iteratively "grow" a document that contains a fictional story of an interaction between User character and a Claude character.

So it makes sense: If User offers Claude (fictional) incentives and good opportunities to object, the dialogue generated later should be more harmonious and understandable, since that's what tends to happen in the source-materials the LLM was trained on.

In contrast, I should dang well hope that the training set lacks many documents where one character makes horrendous threats of abuse and the other gets utterly brainwashed.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: