Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
A million ways to die on the web (archiveteam.org)
242 points by hexage1814 on Jan 18, 2024 | hide | past | favorite | 98 comments


There is a tale - perhaps apocryphal - handed down between generations of AWS staff, of a customer that was all-in on spot instances, until one day the price and availability of their preferred instances took an unfortunate turn, which is to say, all their stuff went away, including most dramatically the customer data that was on the instance storages, and including the replicas that had been mistakenly presumed a backstop against instance loss, and sadly - but not surprisingly - this was pretty much terminal for their startup.


The third largest bitcoin exchange made a change to their RAM settings in EC2. This shut down the machines, wiping out the hard drive and RAM. Their wallet was stored there. They lost everything.

https://siliconangle.com/2011/08/01/third-largest-bitcoin-ex...


So funny reading comments from that era to the articles about this incident. Some say they were lucky because they are using Mt. Gox! Another observation - bitcoin at that time was by many perceived as and had reputation of in-game purchases.


Third largest bitcoin exchange...in 2011.


...with 17,000 BTC.

(Valued at 220k USD then, or 700M USD today)


Well the bitcoin supply is limited, so it's quite a big loss.


For them! Due to the way the economy works, destroying money is actually a donation to everyone else who has money (deflation).


Still hard to imagine this wasnt ever just laughed out the room immediately


It's infinitely divisible, so not in the same sense as the loss of any other limited item. There could be only one Bitcoin in existence and it wouldn't change the utility of the system.


How do you send half a sataoshi?


Get all the verifier nodes to agree to increase the divisibility of BTC into sub-satoshi (e.g. via a BIP request approval). This will not be necessary for many generations.


> This event will go down in a long litany of Bitcoin events that have caused massive losses in the BTC economy.

Reading this from 2011, and reading the latest on Web3IsGoingGreat.com today, is vaguely staggering. They never have learned.


Ooof... One copy and inside an instance? This person should not have been handling money.


How..? Was there no local code on any of the dev machines? No git? I'm asking because for example if Github is vaporized today, my product would lose roughly a day or two's worth of work, since we have like 30 computers having a repository copy.

Of course redeploying every single thing would not be seamless because of course, there might be some configuration stored in services, or something similar, but I'd say that ~90% of our automation is stored in Git.


They lost customer data, not source code. You shouldn't have a local copy of all user data on your machine.


But my customers enjoy the personal touch of me manually editing their SSN in my local MS Access database.


I mean, after all, if you don't have your own copy of the MS Access database then when your team scales beyond about 5 people that database is going to get harder to access. So really everyone should have a copy of all important PII. :P


If something is on the cloud does 3-2-1 backup stop applying?


Only if you're willing to stake your company's digital existence on the reliability of another company's cloud service.

If anything, it increases the need for 3-2-1 backups: the original copy of all of your files are on somebody else's computer that you have no control over. Hopefully they're keeping it backed up, and hopefully they don't go belly up and pull the plug all of a sudden. So you can use a primary backup in another cloud service from another company that hopefully won't kill their product at the same time as the other one (again, you have very little knowledge or control of the way they run their data center). Ultimately, it's a good idea to have a copy of your data that you have control over, maybe in a big drive (or set of drives, tapes, etc) in the safe, rotated daily/weekly/however long your company can cope with losing in a major SHTF situation.

Excessive? Maybe. For what it's worth my shop is locally hosted with both local and cloud backups. I have never regretted having at least one backup of anything and it's saved my bacon (or my coworkers', boss', etc.) a number of times. I've been fortunate to never need to rely on a secondary backup, but I sure wouldn't bet the company on it.


I would also like to know the answer. Would it be a good idea for the company to keep _encrypted_ backups on their machines/HDDs? Not a laptop somewhere, but something just a bit more involved.


It would make sense to keep backup on hard drive stored in safe in office. Doing it weekly would be reasonable but would have to accept that going to lose a week's worth of data.

The main problem is that would outgrow single hard drive so would need NAS. Also, the transfer speed could be an issue as database gets bigger. Even if don't store all customer data, it does make sense to store all the configuration, keys, and secrets.


Yes. Having a copy you can "touch" is important. At the absolute minimum you should have it on another cloud service.


i think for company-critical databases, the best you can do without invoking a terrible headache for your security officer is going multi-cloud: one big tech cloud, and one smaller firm that is completely disconnected from the other one

maybe they could even use a relatively inexpensive colo/baremetal provider to simply mirror the bigtech deployment on a smaller scale (would need to be quite flexible/vendor-agnostic to make that work...)


You can still do off-site backup to another cloud.


Ah that makes more sense, I can't read. I thought that the project stopped working all together, hence the startup was finished. I didn't realize it meant that they simply lost enough customers to go under.


A company's source code is mostly valueless. A company's customer data is priceless.

As Fred Brooks said in Mythical Man-Month: Show me your flowcharts and conceal your tables, and I shall continue to be mystified. Show me your tables, and I won’t usually need your flowcharts; they’ll be obvious.


Crazy, but even AWS resources are not unlimited. In 2023, I experienced multiple days where g4dn instances were not available in us-east-1 (in any AZ).


Heh, they still live in the real world where they have to order or build servers put them in a rack, configure them to be added to the pool.

At their scale lots of their stuff is custom and needs to be ordered at least 18 months in advance.

The fact that they can do capacity planning 2/3 years in advance and have very limited misses in a way that people are astonished that they have capacity misses is a testament to how good they ate at it.


I think a lot of people who have never really used anything other than a major cloud provider don't really understand what goes on behind the scenes. I was a sysadmin at a hosting provider and even though we were magnitudes smaller than AWS, we still saw a lot of the same issues, just on a smaller scale.


Yeah, for sure. I've got a couple team members that I brought along from cloud only, (one having never run any unmanaged k8s even) to building and maintaining on prem compute clusters.

'We had a lot of discussions along the lines of, 'Yeah, that's a great idea, but we can't just autoscale out of bad config or planning' or 'We can't just reboot a host and get a new one, we have to plan to take care of the ones we have'

It's been pretty fun


Were they paying for said spot instances? If so, that's mostly on AWS, not the client. (Unless AWS explicitly says in its TOU that instances can be taken away instantly due to "availability" issues. Which IMHO would be a suicidal policy for AWS to have.)


It is not suicidal and it is in fact what it says. You get 120s notice and there’s no capacity sla even for on-demand or so called "reserved instances". You need to setup https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-capa... to get a guarantee and they will obviously bill you for it even if you're not using it.


That's like a gamma ray burst hitting the planet and just vaporizing it. So unlucky. But obviously fate had other ideas.


Not really with spot instances it was just waiting to happen on a day with more demand than usual. That is 100% expected


More like a balancing rock statue in an area that can have earthquakes


But hey, think of the money they saved before that!


Having all your customers' data on instances local drives (sounds like they didn't use EBS) with no backups sounds pretty dumb, spot instances or not. Those weren't serious people.


Common mode failure.


There's also giving up in protest of a lawsuit, one of the oldest causes:

>Altern.org is a free web hosting service created in 1992 by Valentin Lacambre and disappeared in 2000. From its origins to the closure, Valentin Lacambre, a pioneer of Free Internet in France, had to permanently close the free hosting service in early July 2000 following numerous lawsuits. This closure was due to the laws of the time, which placed the delicate obligation on hosts to act as judge, censor, and, by default, guilty, as it was deemed difficult and contrary to his principles to control the 21,893 sites that existed on Altern.org at the time of closure.

http://yavista.com/98/1f/981fa5fe.html

https://fr.wikipedia.org/wiki/Altern

(Valentin Lacambre went on to be a co-founder of Gandi.net.)


> (Valentin Lacambre went on to be a co-founder of Gandi.net.)

Gandi.net which recently (last year) was purchased by some other corporate entity and subsequently raised all the prices + made previously free features paid. I only mention this as I discovered this week that my Gandi bill suddenly got a lot bigger.


Yeah I'm in the process of migrating away because of it. Sigh. Gandi had a good run.


Same here :/ Saw the writing on the wall when the previous purchase/sale happened to some investment company or whatever it was, but thought I could hold off my migration for a while longer...


It appears their "no bullshit" slogan is gone.


Tigerdirect comes to mind.


I'd add "maintenance burden". I have a few old properties I'm responsible for that used to be web apps, one is archived as static HTML, the other one still exists as an outdated web app. Every other week I receive a request to delete user data. At some point it might make sense to pull the plug on everything :(


Please do let ArchiveTeam know before you pull the plug so at least the public data can be saved.


Wait giphy was bought by meta? What a shame! And they don't sell it off, as they're supposed to? How doesn't it surprise me. A lot of the web feels like being destroyed by FANG. And Andreesen goes on claiming that the market prevents monopolies ...


Meta bought Giphy in 2020, then the UK CMA ruled that it was anti-competitive, so Meta sold Giphy to Shutterstock last year.


You're right! I read (in the original article?) that it wasn't completed yet, but it was:

> The acquisition was completed on June 23, 2023.


VCs in one hand have successful companies (that generally IPOs), and losers on the other hand.

They can use their influence on the people who having voting rights, to actually push the hot-potato to public investors.

That's one big reason why you public companies purchasing somewhat useless companies, or acquihiring them at insane valuation.


They forgot one: Natural disaster deleted all the data and there was no backup.

I was a paying subscriber to a great little site called magweb ~20 years ago. They scanned old and new (military) history and (war) game magazines and posted HTML versions, with permission of original publishers. It was really nice and explicitly allowed users to print or save copies of articles. No DRM. Perfect example of how a site for reading magazines online should work. Everything just basic HTML that even worked great to read in phone browsers 20 years ago.

Then Hurricane Katrina hit and the server that was apparently running out of the owner's basement somewhere in that area was flooded, with no working backups. I still have not found any traces of most of the 40000+ articles, other than the few I had saved, that used to be on that site. Since it was paywalled only a tiny part of the site is available on the wayback machine.

https://web.archive.org/web/20050529083811/http://www.magweb...

(No, I can't imagine running a business and scanning magazines for 9+ years and not make sure to have backups of everything.)


There was also the OVHcloud data centre fire in 2021, although it's debatable if it'd fall under the same "natural disaster" category.

https://www.datacenterdynamics.com/en/news/ovhcloud-ordered-...


I know at least one startup that closed its doors because they lost both production and backup data on that OVH data center.


And that's why 3-2-1 specifies to have one _offsite_ backup


Preferably with a completely different provider, in case something takes out there entire network somehow (or they go out of business).

But make sure you check where the other provider is hosted: I read it one user (individual, not a startup or other company) who had backups of stuff on a shared host with a second shared host. It turned out that they were both running on servers in that one OVH DC and they were cheap hosts with no better backup/dr plan themselves...



I'm gonna Craigslist all your stuff! https://xkcd.com/1150/


I personally was affected by this fire, although I've always kept 3 month backups of production data, encrypted, on-site, just in case of emergencies like this. Haven't touched their services for anything production related ever since


Similarly: Terrorist attack. After 9/11 I remember that some websites for WTC-based companies went offline, presumably because they were hosted on computers in the building.


Rather infamously, NYC's Office of Emergency Management was located in 7 WTC, and had critical communications facilities (broadcast tower) atop 1 WTC:

Since 1999, New York City’s Office of Emergency Management, charged with coordinating all aspects of the response, had occupied permanent headquarters in Seven World Trade Center, on Greenwich Street, just north of the landmark twin towers. A vital communications link was the radio repeater system based on the ground floor of One World Trade Center, the north tower. The loss of those facilities – and key personnel working there – significantly hampered the response.

<https://theconversation.com/disaster-communications-lessons-...>

<https://www.computerworld.com/article/2510996/9-11--top-less...>

I seem to recall an instance (I thought it was the NYC OEM, may have been another) in which both primary and secondary data archives were both within the WTC complex. Best practices now are to locate secondary / backup services at least 100--200 km from the primary. Preferably within different watersheds, seismic regions, etc.

Widespread disasters are comparatively rare, but can affect considerable areas, and locations sufficiently proximate might well be affected.


It's a shame that none of substitutes for DNS seem to have gained any traction so far.

Renting names that serve as resource identifiers, locators and trademarks all at the same time is just not a good idea.


Suppose not full kill, but moderation

Habbo Hotel: It was marked that sexual predators were using the service to groom and instead of tackling the issue, they applied a worldwide mute. You could walk but not talk.

Habbo still exists, but almost killed the whole fanbase. https://en.wikipedia.org/wiki/Habbo#Moderation

And that, you required Shockwave.


My yahoo e-mail account was Yahoo-ed (one day, I logged in on it to find Yahho deleted all my e-mails (some of then important and others of sentimental value), due to account inactivity for a new arbitrary number of days not previously informed.


I fear this for my gmail. I now use mbsync (lieer[0]) to have my emails synced locally on my homeserver, and then browse it with notmuch[1]. It's an incredibly freeing experience to have all your email on your own machine.

0: https://github.com/gauteh/lieer

1: https://notmuch.readthedocs.io/en/latest/man1/notmuch.html


For the exact same reason I started using MacOS Mail app with gmail, but I am realising now that it doesn’t download “all” the emails unfortunately. Lieer looks like a good option, thank you.


It's a great tool, it's just agonizingly slow sometimes because Google likes to throttle the connection when you make big changes. The initial download especially is slow, but the small changes thereafter are pretty fast.


my geocities account was destroyed, putting an end to my career as webmaster. I grew a resentment towards that website


Incidentally my angel fire site is still up for some insane reason


Same! Coincidentally, I recently backed up my Angelfire site from the late '90s. A lot of the original links were missing but thankfully they provide a `sitemap.xml` and I used HTTrack to make a local copy.


Omg I haven’t heard the name Angel fire in YEARS


Browsing into my 'Dead Bookmarks' folder, most of the links were either websites without enough funding to keep the servers online, acquired startups, or streaming services that were killed off by the giants' lawyers. Bash.org is the latest to experience the drag and drop of death.


Oh no! I just lost CG Society and now bash.org is dead? I feel like all the places teenage me used to hang out are all dying out slowly and its sad because those days we believed everything on the internet is forever.


The stuff you want to delete from the internet (embarrassing photos, bad takes from a decade ago) are forever, but the ones you want to keep (great hangouts, cool personal webpages) are fleeting.


This is the difference between security vs archival perspectives.

For security you should assume someone recorded it indefinitely.

For archival you should assume nobody recorded it including the original creator.


I wish I could've put it so well myself!


When did B.O. die?

Best source I seem to find is an HN posting from 4 months ago:

<https://news.ycombinator.com/item?id=37295238>

AzureDiamond/hunter2: RIP


"Associated data" is a point I would not have immediately thought of. As data becomes more and more connected and services consolidated this becomes more important to consider.


I'm frustrated by the fact there are zero archives out there of TwitterX, Instagram or Facebook. Even big brands have shuttered their accounts and now none of their content exists anywhere any longer.


Digged - via website redesign: https://en.wikipedia.org/wiki/Digg#Redesign


Interesting that the article does not show digg, which killed itself.

On a side note, the current form of digg is interesting, yet somehow poorly done. Go there, sort by year: get only things from 2024, since no way to get 2023 or a rolling year. Not to mention being able to choose all or certain time periods


On another side note: I have been running an interesting link recommendation feed for many years now, and evidently in ~2018 someone curating links for Digg found my feed and started relying on it heavily to populate Digg's front page. It went on for months. Some days as much as half of the links I posted would subsequently show up on Digg, including links to unusual and old content (which was a strong signal that the overlap was no mere coincidence).

In 2019 Digg posted a job listing for a links curator, and I cheekily applied, noting that I'm already doing the job anyway, so they might as well pay me for it. They didn't take me up on it, but like magic, the poaching went away.


How do you even make such a list nowadays? In the past tou could take stuff from forums, but now forums are dead. Facebook is trash so apart from known sources (is Rss dead?)organicly finding new stuff sounds harsh. Apart from maybe coppying from reddit / digg / wykop / some spanish reddit equivalent...


A few years ago I made a tool that fetches content from a big list of primary sources (via RSS or HTML), and pushes each link it finds through filters (keyword blacklist, duplicate check, etc). I made a UI that lets me accept or reject links Tinder style, and when I have a scrap of time to fill, I assess a few links.

I also have a small group of well-read friends who make an effort to send me stuff, that helps a lot too.


The current Digg is doing its best to destroy the goodwill they collected with the slightly-less-current Digg.

I started going there regularly about a year and a half ago, mostly because they would feature articles that I wouldn't find in my other typical websites. But in the last six months they have been optimizing to death, cutting anything that's not an instant success and publishing variant after variant of anything that's mildly successful (yay, another article on Twitter memes!). Clickbait titles are there too, along with a new-ish comment section that's 90% spam.

It's been a sobering lesson on what happens when you put growth above everything.


Probably would categorize this under "teh futurez!1!" in the original article


I hosted my first personal website and my Star Treck (sic!) fansite on Tripod. At some point in time they also had a data loss and one of both (along with a lot of other sites) was gone.

This is not mentioned here https://wiki.archiveteam.org/index.php/Tripod because I think the event precedes Archive Teams formation.


Tripod and Lycos were great for kids who were learning HTML and wanted to have their own website. I remember the annoying popups and then banner frames, and how I tried to copy and paste a lot of JS to remove them (so unfair for them, they were providing a free service!)


This should be renamed “ten ways to die on the web”


> teh futurez!1!

It's a sad list. Even Google, after it acquired YouTube, forced me to change my YouTube login to a gmail account.


There's also "Sea Change," where a Web site that was one way, changes to become another way.

An example is BBSpot.com. It's still up and going, but very different from many years ago.

I miss SatireWire. I think it's dead now (ERROR ESTABLISHING A DATABASE CONNECTION).


That happened to mp3.com as well - it was once a Bandcamp-like site, until it was acquired by CBS Interactive after a disastrous music locker service got it nuked from orbit by the RIAA. It's now, for all practical purposes, a parked domain.

I still have a CD-ROM from them (back then, everybody and his brother published a CD-ROM).


Look inside A Million Ways To Die. There isn't a Million Ways To Die.


Loosing memory is an issue, the question is how to maintain the archive?


There are a few solutions:

https://www.arweave.org/

https://www.lighthouse.storage/

To me, they seem like the most useful stuff coming out of the blockchain industry.


Have ArchiveTeam save it to archive.org and donate to keep the service alive.


Tough true? By preserving it yourself.


+ Politically regulated out.

The political environment the site operates in turns hostile to website content or method of publishing. The operators face costs of compliance, loss of scope, or personal risk in continuing.

See: The UK Online Safety Bill [1] [2] and especially [3] "Ofcom's >1,500 page consultation on the Online Safety Act 2023, and why small companies don't have a chance"

[1] https://www.theverge.com/2023/10/26/23922397/uk-online-safet...

[2] https://www.eff.org/deeplinks/2023/09/uk-government-knows-ho...

[3] https://decoded.legal/blog/2023/11/ofcoms-%3E1500-page-consu...


This is one of the main reasons I created Linkwarden - an open-source collaborative bookmark manager to collect, organize and preserve webpages:

GitHub: https://github.com/linkwarden/linkwarden

Website: https://linkwarden.app




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: