Interesting. I wonder if this is related to the model architecture and attention mechanism.
The author seems to be implying it could be: "Even a single mention of ‘code enhancement suggestions’ in our instructions seemed to hijack the model’s attention"
The attention is probably just latching on to strong statistical patterns. Obvious errors create sharp spikes in attention weights, and drown out more subtle signals that can actually matter more
The author seems to be implying it could be: "Even a single mention of ‘code enhancement suggestions’ in our instructions seemed to hijack the model’s attention"