Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> so when our civilization is at risk of "[losing] the entire library" it probably would have already lost the ability to maintain the computer systems to access Wikipedia dumps

But as long as it continues to exist, some future civilization could figure out how to read the data again, eventually. Just like we eventually discovered how to read ancient languages that were once forgotten.



> But as long as it continues to exist, some future civilization could figure out how to read the data again, eventually. Just like we eventually discovered how to read ancient languages that were once forgotten.

Eh, I think you're vastly underestimating how difficult that would be.

1. The media would have to last hundreds of years at least, when it's hoped modern archival media can last maybe fifty.

2. Even assuming the media did last, the new civilization would have to reverse engineer encoding on top of encoding on top of encoding (e.g. physical disk encoding, complex filesystems, file formats, character encodings). Our civilization already has trouble reading some old file formats, and our disks already have trouble reading their data (which is why they pack a ton of error correction information).

It took the Rosetta stone to figure out how to read encoding of Egyptian hieroglyphics, when that language was still alive in the form of Coptic.

3. Then you're dealing with the probability that the hard disks the future archeologists find will even have a Wikipedia dump on them. That probability will be very small, given very few people will download these dumps.


If people still understand english in some form (a good bet. We still understand latin. English has more reach than latin did at its peak) understanding charsets is pretty easy. Just assume its a shift cipher.

As far as media goes. That's true, but its a bit of a numbers game. After all you only need one unusually preserved specimen. The dead sea scrolls survived after all. Not to mention intentional preservation efforts. I know github has its artic vault thing. There's even a copy of wikipedia on the moon! https://meta.m.wikimedia.org/wiki/Wikipedia_to_the_Moon/Wrap...


> If people still understand english in some form (a good bet. We still understand latin. English has more reach than latin did at its peak) understanding charsets is pretty easy. Just assume its a shift cipher.

IIRC, Coptic is directly descended from Ancient Egyptian, but the Rosetta Stone was still needed to decipher hieroglyphics.

It won't be as simple as you think. The problem will be more like: here's 10TB of partially corrupted binary data, find the text when you don't know the encoding (oh, and the text may be compressed with an algorithm you also don't know).




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: