Hacker News new | past | comments | ask | show | jobs | submit login

Deduplication is a poor archival strategy. Storage is cheap. 10TiB is a couple of thousand dollars with reasonable redundancy.

Write Once is how to manage an archive.

Displaying a curated subset is a good interface. Good luck.




this was not helpful at all, and i think you also did not read the goals of this project


I assumed the goal was archiving games for preservation.

If the goal is algorithmic erasure of data rather than preservation of games, then “archive” might again create confusion like the type I probably have.

If there is a strong business case for deduplication, I recommend hiring a consultant with expertise and experience in the problem.

To be clear search is the only way to identify redundancy. If you have search, then redundancy is not a problem.


Often these archive projects have the goal of propagating the archive across many different systems, for which reducing size is very valuable. This is basically a compression exercise in the case where there are many large duplicated blocks.


This is correct - the main goal is to have this rather compact, but still having good read times. This will allow me to store it on my new 4 TiB NVMe drive. A lot of iterative scanning will happen, because I search for interesting information, which helps reverse engineering. Also it allows me to share this with other people over the internet before I kick the bucket in a few decades... transferring 10.4 TiB would be rather boring :D


Thanks, that makes sense.

The question began, I am working on a game preservation project.


The line you referenced is a statement - not a question.

The third line of the post is "The goals are: - bring the size down - retain good read speed (for further processing/reversing) - easy sharable format - lower end machines can use it"


I think the confusion here (for me too) is what precisely is the purpose/meaning of "bring the size down?"

aka, is the is a "make it easier to search" question or "we need to take up fewer bytes" question (which, perhaps, used to be the same question, but aren't necessarily now?)


I think it's "I want people to make copies of this"


it's a question of both "i can store this on my 4 TiB NVMe disk" and "i can share this with other people over the internet in a timely manner"




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: