I have a few images of animals with an extra limb photoshopped onto them. A dog with an leg coming out of it's stomach, or a cat with two front right legs.
Like every other model I have tested, it insists that the animals have their anatomically correct amount of limbs. Even pointing out there is a leg coming from the dogs stomach, it will push back and insist I am confused. Insist it counted again and there are definitely only 4. Qwen took it a step further and even after I told it the image was edited, it told me it wasn't and there were only 4 limbs.
It fails on any edge case, like all other VLMs. The last time a vision model succeeded at reading analog clocks, a notoriously difficult task, it was revealed they trained on nearly 1 million artificial clock images[0] to make it work. In a similar vein, I have encountered no model that could read for example a D20 correctly.[1]
It could probably identify extra limbs in your pictures if you too made a million example images to train it on, but until then it will keep failing. And of course you'll get to keep making millions more example images for every other issue you run into.
Definitely not a good model for accurately counting limbs on mutant species, then. Might be good at other things that have greater representation in the training set.
I'm not knowledgeable about ML but it seems disappointing how we went from "models are able to generalize" and "emergent capabilities" to "can't do anything not greatly represented in the training set".
It will. I actually made a test when NanoBanana first went GA which featured a photo of a one-legged man and asked the model to change the clothing into pants. It added the pants as requested and then proceeded to "heal" his missing leg in the process.
Very difficult for even SOTA to go against data that is as well-represented as bipedal humanoids.
I have a few images of animals with an extra limb photoshopped onto them. A dog with an leg coming out of it's stomach, or a cat with two front right legs.
Like every other model I have tested, it insists that the animals have their anatomically correct amount of limbs. Even pointing out there is a leg coming from the dogs stomach, it will push back and insist I am confused. Insist it counted again and there are definitely only 4. Qwen took it a step further and even after I told it the image was edited, it told me it wasn't and there were only 4 limbs.