The problem was obvious a long time ago, and if I was better at searching I could probably find a comment I made around GPT-3 having system prompts to make it more like a human, which has (at least) 2 effects:
1) Alters your trust value for correctness. I would assume some trust it more because it sounds aware like a human and is trained on a lot of data, and some trust it less because a robot should just output the data you asked for.
2) When asking questions, turning the temperature up was meant to improve variability and being more "lifelike", which of course would mean not return the most probable tokens during inference, meaning (even) less accuracy.
A third one being confidently outputting answers even when none exist was of course a more fundamental issue with the technology, but was absolutely made worse by having an extra page of useless flowery output.
I can't say I predicted this specific effect, but it was very obvious from the get-go that there was no upside to those choices.
1) Alters your trust value for correctness. I would assume some trust it more because it sounds aware like a human and is trained on a lot of data, and some trust it less because a robot should just output the data you asked for.
2) When asking questions, turning the temperature up was meant to improve variability and being more "lifelike", which of course would mean not return the most probable tokens during inference, meaning (even) less accuracy.
A third one being confidently outputting answers even when none exist was of course a more fundamental issue with the technology, but was absolutely made worse by having an extra page of useless flowery output.
I can't say I predicted this specific effect, but it was very obvious from the get-go that there was no upside to those choices.