I assumed the goal was archiving games for preservation.
If the goal is algorithmic erasure of data rather than preservation of games, then “archive” might again create confusion like the type I probably have.
If there is a strong business case for deduplication, I recommend hiring a consultant with expertise and experience in the problem.
To be clear search is the only way to identify redundancy. If you have search, then redundancy is not a problem.
Often these archive projects have the goal of propagating the archive across many different systems, for which reducing size is very valuable. This is basically a compression exercise in the case where there are many large duplicated blocks.
This is correct - the main goal is to have this rather compact, but still having good read times.
This will allow me to store it on my new 4 TiB NVMe drive.
A lot of iterative scanning will happen, because I search for interesting information, which helps reverse engineering.
Also it allows me to share this with other people over the internet before I kick the bucket in a few decades... transferring 10.4 TiB would be rather boring :D
The line you referenced is a statement - not a question.
The third line of the post is "The goals are: - bring the size down - retain good read speed (for further processing/reversing) - easy sharable format - lower end machines can use it"
I think the confusion here (for me too) is what precisely is the purpose/meaning of "bring the size down?"
aka, is the is a "make it easier to search" question or "we need to take up fewer bytes" question (which, perhaps, used to be the same question, but aren't necessarily now?)
Write Once is how to manage an archive.
Displaying a curated subset is a good interface. Good luck.