I wrote a similar (but simpler) script which would replace a file by a hardlink if it has the same content.
My main motivation was for the packages of Python virtual envs, where I often have similar packages installed, and even if versions are different, many files would still match. Some of the packages are quite huge, e.g. Numpy, PyTorch, TensorFlow, etc. I got quite some disk space savings from this.
If you'd like. In the blog post he says he wrote the prototype in an afternoon. Hyperspace does try hard to preserve unique metadata as well as other protections.
My main motivation was for the packages of Python virtual envs, where I often have similar packages installed, and even if versions are different, many files would still match. Some of the packages are quite huge, e.g. Numpy, PyTorch, TensorFlow, etc. I got quite some disk space savings from this.
https://github.com/albertz/system-tools/blob/master/bin/merg...