It’s not just that — google wants its own models to be resilient in the face of ...

nyanpasu64 · on Aug 8, 2021

But you don't have to show humans the noise when you're getting humans to determine the ground truth, only the neural network. I think the noise exists to trip up CAPTCHA bots.

gwern · on Aug 9, 2021

Yeah, a lot of the discussion here and in OP seems to take the machine learning angle at too much face value. Sometimes a CAPTCHA is just a CAPTCHA.

Guys, do you really think that in 2021, when ImageNet has been solved to hell and back even with unlabeled data, and CNNs/Transformers/MLP eat it for breakfast, and we are doing OCR from CLIP trained on unlabeled untranscribed images, Google really needs you to label images of big green US highway signs written in the standardized highway font with text containing names of highways and words like 'miles'?

"Oh yes, I can distinguish between 100 breeds of dogs and I know what every public figure in the world looks like and can edit images with the 'unreal engine' trick - but gosh, I just can't figure out where in this image a big green blob is! Only humans with a soul can possibly do that!"

jszymborski · on Aug 8, 2021

You know, that was my initial reaction reading that comment, but thinking about it a little more made me realise that if you pass the noisey images to humans, they can validate if the noise is such that the signal is gone.

I don't know if that's what they're doing, but it would be a clever way to do data augmentation.

azalemeth · on Aug 8, 2021

I think you're right -- if you browse from a "worse" IP, you get more grain added. I hate gCaptcha. HCaptcha is a little better but still bad...

8note · on Aug 7, 2021

You might also be able to get away with cheaper cameras