Since this is using a db of known images. I doubt that would be an issue. I believe the idea here is that once police raid an illegal site, they collect all of the images in a db and then want to know a list of every person who had these images saved.
But it said they use a "perceptual hash" - so it's not just looking for 1:1, byte-for-byte copies of specific photos, it's doing some kind of fuzzy matching.
This has me pretty worried - once someone has been tarred with this particular brush, it sticks.
You can’t do a byte-for-byte hash on images because a slight resize or minor edit will dramatically change the hash, without really modifying the image in a meaningful way.
But image hashes are “perceptual” in the sense that the hash changes proportionally with the image. This is how reverse image searching works, and why it works so well.
Sure, I get how it works, but I feel like false positives are inevitable with this approach. That wouldn't necessarily be an issue under normal police circumstances where they have a warrant and a real person reviews things, but it feels really dangerous here. As I mentioned, any accusations along these lines have a habit of sticking, regardless of reality - indeed, irrational FUD around the Big Three (terrorism, paedophilia and organised crime) is the only reason Apple are getting a pass for this.
There is also a number of flagged pictures to reach before an individual is actually classified as a "positive" match.
It is claimed that the chance of being a false-positive for a positive match is one out of a trillion.
> Apple says this process is more privacy mindful than scanning files in the cloud as NeuralHash only searches for known and not new child abuse imagery. Apple said that there is a one in one trillion chance of a false positive.
This isn't CSAM or illegal, nor would it ever end up in a database. Speaking generally, content has to be sexualized or have a sexual purpose to be illegal. Simple nudity does not count inherently.
That’s not entirely true. If a police officer finds you in possession of a quantity of CP, especially of multiple different children, you’ll at least be brought in for questioning if not arrested/tried/convicted, whether the images were sexualized or not.
> nor would it ever end up in a database
That’s a bold blanket statement coming from someone who correctly argued that NCMEC’s database has issues (as in I know your previous claim is true because I’ve seen false positives for completely innocent images, both legally and morally). That said, with the amount of photos accidentally shared online (or hacked), to say that GP’s scenario can not ever end up in a database seems a bit off the mark. It’s very unlikely as sibling commenter said, but still possible.
That's why I said it's not inherently illegal. Of course, if you have a folder called "porn" that is full of naked children it modifies the context and therefore the classification. But, if it's in a folder called "Beach Holiday 2019", it's not illegal nor really morally a problem. I'm dramatically over-simplifying of course. "It depends" all the way down.
>That’s a bold blanket statement
You're right, I shouldn't have been so broad. It's possible but unlikely, especially if it's not shared on social media.
It reinforces my original point however, because I can easily see a case where there's a totally voluntary nudist family who posts to social media getting caught up in a damaging investigation because of this. If their pictures end up in the possession of unsavory people and gets lumped into NCMEC's database then it's entirely possible they get flagged dozens or hundreds of times and get referred to police. Edge case, but a family is still destroyed over it. Some wrongfully accused people have their names tarnished permanently.
This kind of policy will lead to innocent people getting dragged through the mud. For that reason alone, this is a bad idea.
> But, if it's in a folder called "Beach Holiday 2019", it's not illegal nor really morally a problem.
With all due respect, please please stop making broad blanket statements like this. I'm far from a LEO/lawyer, yet I can think of at least a dozen ways a folder named that could be illegal and/or immoral.
> This kind of policy will lead to innocent people getting dragged through the mud. For that reason alone, this is a bad idea.
I believe both you and the other poster, but I still haven't seen anyone give an example of a false positive match they've observed. Was it an actual image of a person? Were they clothed? etc.
It's very concerning if the fuzzy hash is too fuzzy, but I'm curious to know just how fuzzy it is.
> Was it an actual image of a person? Were they clothed?
Some of the false positives were of people, others weren’t. It’s not that the hashing function itself was problematic, but that the database of hashes had hashes which weren’t of CP content, as the chance of a collision was way lower than the false positive rate (my guess is it was “data entry” type mistakes by NCMEC, but I have no proof to back up that theory). I made it a point to never personally see any content which matched against NCMEC’s database until it was deemed “safe” as I didn’t want anything to do with it (both from a disgusted perspective and also from a legal risk perspective), but I had coworkers who had to investigate every match and I felt so bad for them.
In the case of PhotoDNA, the hash is conceptually similar to an MD5 or a SHA1 hash of the file. The difference between PhotoDNA and your normal hash functions is that it’s not an exact hash of the raw bytes, but rather more like the “visual representation” of the image. When we were doing the initial implementation / rollout (I think late 2013ish), I did a bunch of testing to see how much I could vary a test image and have the hash be the same as I was curious. Resizes or crops (unless drastic) would almost always come back within the fuzziness window we were using. Overlaying some text or a basic shape (like a frame) would also often match. I then used photoshop to tweak color/contrast/white balance/brightness/etc and that’s where it started getting hit or miss.
Unless I'm missing something, those are just theoretical examples of how one could potentially deliberately try to find hash collisions, using a different, simpler perceptual hash function: https://twitter.com/matthew_d_green/status/14230842449522892...
So, it's theoretical, it's a different algorithm, and it's a case where someone is specifically trying to find collisions via machine learning. (Perhaps by "reversing" the hash back to something similar to the original content.)
The two above posters claim that they saw cases where there was a false positive match from the actual official CSAM hash algorithm on some benign files that happened to be on a hard drive; not something deliberately crafted to collide with any hashes.
You're not missing something, but you're not likely to get real examples because as I understand it the algorithm and database are private, the posters above are just guardedly commenting with (claimed) insider knowledge, they're not likely to want to leak examples (and not just that it's private, but with the supposed contents.. Would you really want to be the one saying 'but it isn't, look'? Would you trust someone who did, and follow such a link to see for yourself?)
To be clear, I definitely didn't want examples in terms of links to the actual content. Just a general description. Like, was a beach ball misclassified as a heinous crime, or was it perfectly legal consensual porn with adults that was misclassified, or was it something that even a human could potentially mistake for CSAM. Or something else entirely.
I understand it seems like they don't want to give examples, perhaps due to professional or legal reasons, and I can respect that. But I also think that information is very important if they're trying to argue a side of the debate.
> I understand it seems like they don't want to give examples, perhaps due to professional or legal reasons, and I can respect that.
In my case, it’s been 7 years so I’m not confident enough of my memory to give a detail description of each false positive. All I can say is that the photos that were false positive that included people were either very obviously fully clothed and doing something normal, or the photo was of something completely innocuous all together (I seem to remember an example of the latter was the Windows XP green field stock desktop wallpaper, but I’m not positive on that).