This will strip ALL exif metadata, change the quality, shave 10 pixels off each edge just because, resize to xx%, attenuate, and adds noise of type "Uniform".
Some additional notes:
- attenuate needs to come before the +noise switch in the command line
- the worse the jpeg quality figure, the harder it is to detect image modifications[1]
- resize percentage can be a real number - so 91.5% or 92.1% ...
So, AI image detection notwithstanding, you can not only remove metadata but also make each image you publish different from one another - and certainly very different than the original picture you took.
Clearly better than nothing, but how does it work with perceptual hashes? I gave it five minutes to try to get pHash to run locally but didn't manage to get any useful results from it, I was probably holding it wrong.
I’ve been working with perceptual hashes a lot lately for a side project, and my experience is that they are extremely resilient to noise, re-encoding, resizing, and some changes in color (since most implementations desaturate the image). Mirroring and rotation can in theory defeat perceptual hashing, but it’s fast enough to compute that if you care you can easily hash horizontal and vertically mirrored versions at 1 degree increments of rotation to identify those cases. Affine transformations can easily defeat some perceptual hashing algorithms, but others are resistant to them.
The big weakness is that most perceptive hashing algorithms aren’t content aware, so you can easily defeat them by adding or removing background objects that might not be noticed or considered meaningful by a human observer.
Could probably get one of the many repos up and running pretty quickly [1].
Potentially what you could do is generate smaller versions of the images, test their hash matching under different conditions against multiple algorithms and then pick the parameters where you get fewest hash collisions.
You and several of your siblings here are missing the point - this is not about resisting or obfuscating image content or subject or fooling an AI classifier, etc.
This imagemagick command is an attempt to remove digital forensic clues that would tie, for instance, an image posted by one pseudonym to an image posted by another pseudonym.
At what confidence level can a raw HEIC from my iphone be tied to the jpeg that results from this cropping, resizing, noise and attenuation ?
At what confidence level can one such transformed jpeg be tied to another such transformed jpeg ? (assuming that you scramble the values for (quality/shave/resize/attenuate) ...)
This is tangential to the OP and the discussion - forgive me - but I think it's an interesting tangent.
It the image is watermarked, you can't remove it that way. Watermarks easily survive uniform noise higher than humans can tolerate. Watermark data is typically stored redundantly in multiple locations and channels, so uniform noise mostly averages itself out, and cropping won't do much. Watermarks often add signal in a different color model than RGB and in a heavily transformed domain of the image, so you're not adding noise along the "axis" of watermark's signal.
For similarity search, it also won't do much. Algorithms for this look for dozens of "landmarks", and then search for images that share a high percentage of them. The landmarks traditionally were high-contrast geometric features like corners, which wouldn't be affected by noise. Nowadays, landmarks can be whatever a neural network learns to pick when trained against typical deformations like compression and noise.
Does this remove the color profile, though? I strip all mine with exiftool, but I exclude the color profile otherwise the entire image is screwed, especially if it's in some odd colorspace.
Some additional notes:
- attenuate needs to come before the +noise switch in the command line
- the worse the jpeg quality figure, the harder it is to detect image modifications[1]
- resize percentage can be a real number - so 91.5% or 92.1% ...
So, AI image detection notwithstanding, you can not only remove metadata but also make each image you publish different from one another - and certainly very different than the original picture you took.
[1] https://fotoforensics.com/tutorial.php?tt=estq