> My solution was to use webcam if available and make people wave hand or some other gesture in order to verify that you are really human.
This seems even worse to me. Wave at the camera to prove you are human... This is a whole other level of creepy above depressing captchas. I would not do this, ever.
My solution was all about solving cultural problem of captcha as well as security by that I mean automated bots exploiting and solving captcha en masse. Ofc the service would guarantee you privacy I don't need to keep the video footage of you or the mic footage I would delete it immediately when the captcha task is solved.
You would need to trust the service the same like people are trusting logless VPN providers with their sensitive internet traffic logs.
It's not the security or trust that is even the issue.
Even if the footage was never recorded or deleted instantly after. Even if it was 100% anonymous and 100% guaranteed secure, it still feels like some creepy dystopian nightmare.
A system like this means we've catastrophically failed with technology. It's too dehumanizing. At this point, burn it all to the ground, we're done.
Then how we should solve captcha problem? Captcha should be easy to do for humans, easy to verify but hard to solve for bots. It should be humane and happy. How to do it?
It's easy for a bot to provide a pre-recorded or generated video of someone waving, and pronouncing a sentence is much harder for many humans than for bots. I don't think having to prove you're human will ever be a happy experience.
Captcha is served over browsers and browsers and websites can easily detect webcam device and process video recording. The same applies for mic and voice recording.
If webcam or mic is not detected and webcam recording or mic recording was given then obviously malicious actor is trying to cheat the captcha and submit fake recording/s.
Edit: You would be prompted with something like: "Wave your hand from left to right." And then video recorder would be presented to you and you would record yourself doing it and review the footage and submit it to the captcha task.
The same applies to mic and voice recording. For example you would be prompted with: Please say: "Life is like riding a bicycle. To keep your balance, you must keep moving." And then voice recorder would be presented to you and you would say it, review it and submit it in order to fulfill captcha task.
I'm betting on a physical recording device that a person has not on the ability or inability of bots or an AI to come up with a human video recording or a voice recording.
It's easy to make hardware or software virtual mics and cameras that can't be detected as such. Getting that pronunciation task right will be easier for a bot using TTS through a virtual mic than for many non-native English speakers.
A bigger problem is that once you've given some random site permission to access your mic or camera, it can use that access for anything it wants.
OBS can present its output as a virtual webcam, so can Logitech's webcam software, so can driver-level code, so can lots of other things (https://duckduckgo.com/?q=%22virtual+webcam%22, or just search for your own phrase [fake webcam device]). Virtual microphones are similar, and there are lots of options, e.g. Equalizer APO does audio processing in the Windows kernel before applications get it, and supports VST plugins that can generate audio. Browsers see these things exactly how they see hardware webcams and microphones (since those go through drivers in the same way).
Chromium or Firefox could easily be modified to have virtual mic and webcam support built-in, without needing driver-level shenanigans. Bots can be run in virtual machines where the "hardware" is just software that can be fed any input you like. Hardware-wise, a Teensy or Arduino can plug into a USB port, pretend to be any USB device, and feed in anything you like. You could even use a computer's real microphone and webcam, just pointing at the speakers and screen of another (but there's not much point, since doing it in software is simpler).
Metadata is trivial to modify, e.g. FFMpeg can do it, and it usually doesn't say anything about genuine-ness anyway. Besides, if you're generating your fake input in software, you'd just generate metadata to go with it.
Sorry, but there are so many holes in your idea that it seems completely unworkable. Even if new developments made it more feasible (new webcams and microphones that cryptographically sign their recordings? Still fakeable by the speakers and screen setup above. Depth-sensing cameras might be trickier.) people would rebel against allowing any and every site to access their webcam and microphone (and hence possibly spy on them).
Thanks for extensive explanation, but idk how feasible and cost effective would be to do this "en masse". It seems like a lot of work and effort for small gain of cracking a captcha task/s. And yes I agree privacy would be a problem because people don't want some random and untrusted site to access their webcam and/or mic. But I still thinks my idea has some potential. Maybe someone else will come up with better captcha idea.
This seems even worse to me. Wave at the camera to prove you are human... This is a whole other level of creepy above depressing captchas. I would not do this, ever.