Hear me out. I was thinking jokingly to myself, "for how bad these models are at...

nialv7 42 days ago | parent | context | favorite | on: Vision Language Models Are Biased

Hear me out. I was thinking jokingly to myself, "for how bad these models are at recognizing five legged dogs, they sure are great at generating them!"

But then it hit me, could this actually be why this is? Diffusion models work by iteratively improving a noisy image. So if it couldn't recognize there is something wrong with the image, it can't fix it.

vokhanhan25 42 days ago [–]

I agree. If it doesn't know the abnormality then how can it control its output