Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> dependent on human curated reward models to score the output to make it usable.

This is a false premise, because there already exist systems, currently deployed, which are not dependent on human-curated reward models.

Refutations of your point include existing systems which generate a reward model based on some learned AI scoring function, allowing self-bootstrapping toward higher and higher levels.

A different refutation of your point is the existing simulation contexts, for example, by R1, in which coding compilation is used as a reward signal; here the reward model comes from a simulator, not a human.

> So it still depends on human thinking

Since your premise was false your corollary does not follow from it.

> And there won't be a point when human curated reward models are not needed anymore.

This is just a repetition of your previously false statement, not a new one. You're probably becoming increasingly overconfident by restating falsehoods in different words, potentially giving the impression you've made a more substantive argument than you really have.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: