> I added the koala kicking a football as well, I have to admit that they come out just as bad.
And this is exactly my point.
> The problem is, if certain training data is blanket removed, it creates holes in the understanding a model has. We see this in language models as well: censored models can get very obstinate in their refusal to discuss certain completely normal topics, because it links them to something that has been scrubbed.
This is doubly wrong. First, not having porn in your dataset absolutely has no bearing on your ability to draw a man kicking a football. Second, the thing you're discussing is not a lack of training data but a fine tuning to try and remove it due to the inevitability of some porn slipping in. SDXL did this too, but its a perfectly serviceable model.
What we're seeing here is almost certainly just the model being bad because it's much smaller. Drawing unusual poses like kicking a ball is much more difficult than people expect.
This drastic drop is performance is entirely consistent with similar drops of performance seen on LLMs in their respective space for a comparable change in parameter count.
People are contorting themselves into a conspiracy because they're seeing a lot of bad humans get drawn because HUMANS ARE WHAT PEOPLE OVERWHELMINGLY TRY TO DRAW. The model is just generically bad.
https://blog.mozilla.org/en/products/firefox/update-on-terms...
Read the italicised text.