> But reward models are always curated by humans. There is no inherent reason wh...

gitaarik · 2025-05-30T16:01:26 1748620886

Sorry but I don't follow your logic. Are you claiming that reward models that aren't curated by humans perform as well as ones that are?

Then what is a reward model's function according to you?

vidarh · 2025-05-31T09:06:01 1748682361

I'm claiming exactly what I wrote: That there is no inherent reason why a human curated one needs to be better.

JoshCole · 2025-05-30T17:14:56 1748625296

In reinforcement learning and related fields, a _reward model_ is a function that assigns a scalar value (a reward) to a given state, representing how desirable it is. You're at liberty to have compound states: for an example, a trajectory (often called tau) or a state action pair (typically represented by s and a).