Hacker News new | past | comments | ask | show | jobs | submit login

For an example of a reward model that doesn't include "C" explicitly consider a reward model defined to be the count of the one bits in letters in the input. It would define a reward for "C" but "C" doesn't show up explicitly, because the reward had universal reach and "C" was among its members as a result.





Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: