Instruction tuning is distinct from RLHF. Instruction tuning teaches the model to understand and respond (in a sensible way) to instructions, versus 'just' completing text.
RLHF trains a model to adjust it's output based on a reward model. The reward model is trained from human feedback.
You can have an instruction tuned model with no RLHF, RLHF with no instruction tuning, or instruction tuning and RLHF. Totally orthogonal.
Man it really doesn't need to be said that RLHF is not the only way to instruct tune. The point of my comment was to say that was how GPT3.5 was instruct tuned, via RLHF through a question answer dataset.
At least we have this needless nerd snipe so others won't be potentially misled by my careless quip.
But that's still false. RLHF is not instruction fine-tuning. It is alignment.
GPT 3.5 was first fine-tuned (supervised, not RL) on an instruction dataset, and then aligned to human expectations using RLHF.
In essence they've been fine tuned to be able to follow instructions.
https://openai.com/research/instruction-following