Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

They have been RLHF (reinforcement learning with human feedback) tuned.

In essence they've been fine tuned to be able to follow instructions.

https://openai.com/research/instruction-following



Instruction tuning is distinct from RLHF. Instruction tuning teaches the model to understand and respond (in a sensible way) to instructions, versus 'just' completing text.

RLHF trains a model to adjust it's output based on a reward model. The reward model is trained from human feedback.

You can have an instruction tuned model with no RLHF, RLHF with no instruction tuning, or instruction tuning and RLHF. Totally orthogonal.


In this case Open AI used RLHF to instruct-tune gpt3. Your pedantism here is unnecessary.


Not to be pedantic, but it’s “pedantry”.


It's not being pedantic. RLHF and instruction tuning are completely different things. Painting with watercolors does not make water paint.

Nearly all popular local models are instruction tuned, but are not RLHF'd. The OAI GPT series are not the only LLMs in the world.


Man it really doesn't need to be said that RLHF is not the only way to instruct tune. The point of my comment was to say that was how GPT3.5 was instruct tuned, via RLHF through a question answer dataset.

At least we have this needless nerd snipe so others won't be potentially misled by my careless quip.


But that's still false. RLHF is not instruction fine-tuning. It is alignment. GPT 3.5 was first fine-tuned (supervised, not RL) on an instruction dataset, and then aligned to human expectations using RLHF.


You're right, thanks for the correction


It sounds like we both know that's the case, but there's a ton of incorrect info being shared in this thread re: RLHF and instruction tuning.

Sorry if it came off as more than looking to clarify it for folks coming across it.


Yes all that misinfo was what lead me to post a quick link. I could have been more clear anyways. Cheers.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: