> It's well known that imitation learning fails miserably when presented with conditions that are not in your training set, i.e., answering questions that don't exist on the Internet already
That makes no sense to me. These models are never trained on the same bit of data twice (unless, of course, it is duplicated somewhere else). So essentially every time they predict they are predicting on 'conditions not in the training set' ie. ones they have never seen before, and they're getting astonishingly good perplexities.
I agree RLHF helps reduce hallucination, but increasing generalizability? Not so sure.
That makes no sense to me. These models are never trained on the same bit of data twice (unless, of course, it is duplicated somewhere else). So essentially every time they predict they are predicting on 'conditions not in the training set' ie. ones they have never seen before, and they're getting astonishingly good perplexities.
I agree RLHF helps reduce hallucination, but increasing generalizability? Not so sure.