I assumed the goal was archiving games for preservation.
If the goal is algorithmic erasure of data rather than preservation of games, then “archive” might again create confusion like the type I probably have.
If there is a strong business case for deduplication, I recommend hiring a consultant with expertise and experience in the problem.
To be clear search is the only way to identify redundancy. If you have search, then redundancy is not a problem.
Often these archive projects have the goal of propagating the archive across many different systems, for which reducing size is very valuable. This is basically a compression exercise in the case where there are many large duplicated blocks.
This is correct - the main goal is to have this rather compact, but still having good read times.
This will allow me to store it on my new 4 TiB NVMe drive.
A lot of iterative scanning will happen, because I search for interesting information, which helps reverse engineering.
Also it allows me to share this with other people over the internet before I kick the bucket in a few decades... transferring 10.4 TiB would be rather boring :D
The line you referenced is a statement - not a question.
The third line of the post is "The goals are: - bring the size down - retain good read speed (for further processing/reversing) - easy sharable format - lower end machines can use it"
I think the confusion here (for me too) is what precisely is the purpose/meaning of "bring the size down?"
aka, is the is a "make it easier to search" question or "we need to take up fewer bytes" question (which, perhaps, used to be the same question, but aren't necessarily now?)