Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Question aside. Can anyone recommend any opensource de-duplication tool(s)? I've realized that I have the same data over many drives but manually going through them even for a single drive will take a ton of time. I'm wondering if there's something smart enough where you input paths to be scanned and magically outputs de-duplicated data to a single coherent place...

Edit: Some corrections. I forgot to mention which OS: GNU/Linux and/or BSDs.



My research into this many years ago turned out that jdupes was the right / best solution I could find for my usecase.

https://github.com/jbruchon/jdupes

Though that works fine from a script perspective I'd like some more interactive way of sorting directories etc. Identifying is just the first step, jdupes helps with linking the files (both soft and hard links comes with caveats though!) but that is mostly to save space, not to help in reorganisation.


It seems to me that is not a trivial problem to solve: de-duplication + reorganization. Maybe I'm incorrect. It also seems the kind of problem where it could be super-easy to screw it if you go with a custom made script plugging different tools...


I've never tried it myself but the README mentions several other tools.

https://github.com/dpc/rdedup/



I used DupeGuru (https://dupeguru.voltaicideas.net/) in the past but I'm not sure it's the best solution for you. Try it, it's open-source.


rmlint.

I've tried many, but rmlint is the most flexible and reliable. Esp. the tagging works really well.

https://github.com/sahib/rmlint


what os? for use in a console, there's rdfind or fdupes




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: