It's really weird that this works. I can see how LoRA on a specific fine-grained...

It's really weird that this works. I can see how LoRA on a specific fine-grained concept like Ugly Sonic can work with so few samples, but naively I'd think such a diffuse concept as "!wrong" should require more bits to specify! Like, isn't the loss function already penalizing the model for being "wrong" on all generated images?

(I wonder if there is a follow-up experiment to test if this LoRA'd model actually has better loss on the original training dataset? There's a very interesting interpretability question here I think. Maybe it's just doing much better on a small subset of possible images, but is slightly worse on the remainder of the training data distribution.)