There is a tale - perhaps apocryphal - handed down between generations of AWS staff, of a customer that was all-in on spot instances, until one day the price and availability of their preferred instances took an unfortunate turn, which is to say, all their stuff went away, including most dramatically the customer data that was on the instance storages, and including the replicas that had been mistakenly presumed a backstop against instance loss, and sadly - but not surprisingly - this was pretty much terminal for their startup.
The third largest bitcoin exchange made a change to their RAM settings in EC2. This shut down the machines, wiping out the hard drive and RAM. Their wallet was stored there. They lost everything.
So funny reading comments from that era to the articles about this incident. Some say they were lucky because they are using Mt. Gox! Another observation - bitcoin at that time was by many perceived as and had reputation of in-game purchases.
It's infinitely divisible, so not in the same sense as the loss of any other limited item. There could be only one Bitcoin in existence and it wouldn't change the utility of the system.
Get all the verifier nodes to agree to increase the divisibility of BTC into sub-satoshi (e.g. via a BIP request approval). This will not be necessary for many generations.
How..? Was there no local code on any of the dev machines? No git? I'm asking because for example if Github is vaporized today, my product would lose roughly a day or two's worth of work, since we have like 30 computers having a repository copy.
Of course redeploying every single thing would not be seamless because of course, there might be some configuration stored in services, or something similar, but I'd say that ~90% of our automation is stored in Git.
I mean, after all, if you don't have your own copy of the MS Access database then when your team scales beyond about 5 people that database is going to get harder to access. So really everyone should have a copy of all important PII. :P
Only if you're willing to stake your company's digital existence on the reliability of another company's cloud service.
If anything, it increases the need for 3-2-1 backups: the original copy of all of your files are on somebody else's computer that you have no control over. Hopefully they're keeping it backed up, and hopefully they don't go belly up and pull the plug all of a sudden. So you can use a primary backup in another cloud service from another company that hopefully won't kill their product at the same time as the other one (again, you have very little knowledge or control of the way they run their data center). Ultimately, it's a good idea to have a copy of your data that you have control over, maybe in a big drive (or set of drives, tapes, etc) in the safe, rotated daily/weekly/however long your company can cope with losing in a major SHTF situation.
Excessive? Maybe. For what it's worth my shop is locally hosted with both local and cloud backups. I have never regretted having at least one backup of anything and it's saved my bacon (or my coworkers', boss', etc.) a number of times. I've been fortunate to never need to rely on a secondary backup, but I sure wouldn't bet the company on it.
I would also like to know the answer. Would it be a good idea for the company to keep _encrypted_ backups on their machines/HDDs? Not a laptop somewhere, but something just a bit more involved.
It would make sense to keep backup on hard drive stored in safe in office. Doing it weekly would be reasonable but would have to accept that going to lose a week's worth of data.
The main problem is that would outgrow single hard drive so would need NAS. Also, the transfer speed could be an issue as database gets bigger. Even if don't store all customer data, it does make sense to store all the configuration, keys, and secrets.
i think for company-critical databases, the best you can do without invoking a terrible headache for your security officer is going multi-cloud: one big tech cloud, and one smaller firm that is completely disconnected from the other one
maybe they could even use a relatively inexpensive colo/baremetal provider to simply mirror the bigtech deployment on a smaller scale (would need to be quite flexible/vendor-agnostic to make that work...)
Ah that makes more sense, I can't read. I thought that the project stopped working all together, hence the startup was finished. I didn't realize it meant that they simply lost enough customers to go under.
A company's source code is mostly valueless. A company's customer data is priceless.
As Fred Brooks said in Mythical Man-Month: Show me your flowcharts and conceal your tables, and I shall continue to be mystified. Show me your tables, and I won’t usually need your flowcharts; they’ll be obvious.
Crazy, but even AWS resources are not unlimited. In 2023, I experienced multiple days where g4dn instances were not available in us-east-1 (in any AZ).
Heh, they still live in the real world where they have to order or build servers put them in a rack, configure them to be added to the pool.
At their scale lots of their stuff is custom and needs to be ordered at least 18 months in advance.
The fact that they can do capacity planning 2/3 years in advance and have very limited misses in a way that people are astonished that they have capacity misses is a testament to how good they ate at it.
I think a lot of people who have never really used anything other than a major cloud provider don't really understand what goes on behind the scenes. I was a sysadmin at a hosting provider and even though we were magnitudes smaller than AWS, we still saw a lot of the same issues, just on a smaller scale.
Yeah, for sure. I've got a couple team members that I brought along from cloud only, (one having never run any unmanaged k8s even) to building and maintaining on prem compute clusters.
'We had a lot of discussions along the lines of, 'Yeah, that's a great idea, but we can't just autoscale out of bad config or planning' or 'We can't just reboot a host and get a new one, we have to plan to take care of the ones we have'
Were they paying for said spot instances? If so, that's mostly on AWS, not the client. (Unless AWS explicitly says in its TOU that instances can be taken away instantly due to "availability" issues. Which IMHO would be a suicidal policy for AWS to have.)
It is not suicidal and it is in fact what it says. You get 120s notice and there’s no capacity sla even for on-demand or so called "reserved instances". You need to setup https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-capa... to get a guarantee and they will obviously bill you for it even if you're not using it.
Having all your customers' data on instances local drives (sounds like they didn't use EBS) with no backups sounds pretty dumb, spot instances or not. Those weren't serious people.
There's also giving up in protest of a lawsuit, one of the oldest causes:
>Altern.org is a free web hosting service created in 1992 by Valentin Lacambre and disappeared in 2000. From its origins to the closure, Valentin Lacambre, a pioneer of Free Internet in France, had to permanently close the free hosting service in early July 2000 following numerous lawsuits. This closure was due to the laws of the time, which placed the delicate obligation on hosts to act as judge, censor, and, by default, guilty, as it was deemed difficult and contrary to his principles to control the 21,893 sites that existed on Altern.org at the time of closure.
> (Valentin Lacambre went on to be a co-founder of Gandi.net.)
Gandi.net which recently (last year) was purchased by some other corporate entity and subsequently raised all the prices + made previously free features paid. I only mention this as I discovered this week that my Gandi bill suddenly got a lot bigger.
Same here :/ Saw the writing on the wall when the previous purchase/sale happened to some investment company or whatever it was, but thought I could hold off my migration for a while longer...
I'd add "maintenance burden". I have a few old properties I'm responsible for that used to be web apps, one is archived as static HTML, the other one still exists as an outdated web app. Every other week I receive a request to delete user data. At some point it might make sense to pull the plug on everything :(
Wait giphy was bought by meta? What a shame! And they don't sell it off, as they're supposed to? How doesn't it surprise me. A lot of the web feels like being destroyed by FANG. And Andreesen goes on claiming that the market prevents monopolies ...
They forgot one: Natural disaster deleted all the data and there was no backup.
I was a paying subscriber to a great little site called magweb ~20 years ago. They scanned old and new (military) history and (war) game magazines and posted HTML versions, with permission of original publishers. It was really nice and explicitly allowed users to print or save copies of articles. No DRM. Perfect example of how a site for reading magazines online should work. Everything just basic HTML that even worked great to read in phone browsers 20 years ago.
Then Hurricane Katrina hit and the server that was apparently running out of the owner's basement somewhere in that area was flooded, with no working backups. I still have not found any traces of most of the 40000+ articles, other than the few I had saved, that used to be on that site. Since it was paywalled only a tiny part of the site is available on the wayback machine.
Preferably with a completely different provider, in case something takes out there entire network somehow (or they go out of business).
But make sure you check where the other provider is hosted: I read it one user (individual, not a startup or other company) who had backups of stuff on a shared host with a second shared host. It turned out that they were both running on servers in that one OVH DC and they were cheap hosts with no better backup/dr plan themselves...
I personally was affected by this fire, although I've always kept 3 month backups of production data, encrypted, on-site, just in case of emergencies like this. Haven't touched their services for anything production related ever since
Similarly: Terrorist attack. After 9/11 I remember that some websites for WTC-based companies went offline, presumably because they were hosted on computers in the building.
Rather infamously, NYC's Office of Emergency Management was located in 7 WTC, and had critical communications facilities (broadcast tower) atop 1 WTC:
Since 1999, New York City’s Office of Emergency Management, charged with coordinating all aspects of the response, had occupied permanent headquarters in Seven World Trade Center, on Greenwich Street, just north of the landmark twin towers. A vital communications link was the radio repeater system based on the ground floor of One World Trade Center, the north tower. The loss of those facilities – and key personnel working there – significantly hampered the response.
I seem to recall an instance (I thought it was the NYC OEM, may have been another) in which both primary and secondary data archives were both within the WTC complex. Best practices now are to locate secondary / backup services at least 100--200 km from the primary. Preferably within different watersheds, seismic regions, etc.
Widespread disasters are comparatively rare, but can affect considerable areas, and locations sufficiently proximate might well be affected.
Habbo Hotel: It was marked that sexual predators were using the service to groom and instead of tackling the issue, they applied a worldwide mute. You could walk but not talk.
My yahoo e-mail account was Yahoo-ed (one day, I logged in on it to find Yahho deleted all my e-mails (some of then important and others of sentimental value), due to account inactivity for a new arbitrary number of days not previously informed.
I fear this for my gmail. I now use mbsync (lieer[0]) to have my emails synced locally on my homeserver, and then browse it with notmuch[1]. It's an incredibly freeing experience to have all your email on your own machine.
For the exact same reason I started using MacOS Mail app with gmail, but I am realising now that it doesn’t download “all” the emails unfortunately. Lieer looks like a good option, thank you.
It's a great tool, it's just agonizingly slow sometimes because Google likes to throttle the connection when you make big changes. The initial download especially is slow, but the small changes thereafter are pretty fast.
Same! Coincidentally, I recently backed up my Angelfire site from the late '90s. A lot of the original links were missing but thankfully they provide a `sitemap.xml` and I used HTTrack to make a local copy.
Browsing into my 'Dead Bookmarks' folder, most of the links were either websites without enough funding to keep the servers online, acquired startups, or streaming services that were killed off by the giants' lawyers. Bash.org is the latest to experience the drag and drop of death.
Oh no! I just lost CG Society and now bash.org is dead? I feel like all the places teenage me used to hang out are all dying out slowly and its sad because those days we believed everything on the internet is forever.
The stuff you want to delete from the internet (embarrassing photos, bad takes from a decade ago) are forever, but the ones you want to keep (great hangouts, cool personal webpages) are fleeting.
"Associated data" is a point I would not have immediately thought of. As data becomes more and more connected and services consolidated this becomes more important to consider.
I'm frustrated by the fact there are zero archives out there of TwitterX, Instagram or Facebook. Even big brands have shuttered their accounts and now none of their content exists anywhere any longer.
Interesting that the article does not show digg, which killed itself.
On a side note, the current form of digg is interesting, yet somehow poorly done. Go there, sort by year: get only things from 2024, since no way to get 2023 or a rolling year. Not to mention being able to choose all or certain time periods
On another side note: I have been running an interesting link recommendation feed for many years now, and evidently in ~2018 someone curating links for Digg found my feed and started relying on it heavily to populate Digg's front page. It went on for months. Some days as much as half of the links I posted would subsequently show up on Digg, including links to unusual and old content (which was a strong signal that the overlap was no mere coincidence).
In 2019 Digg posted a job listing for a links curator, and I cheekily applied, noting that I'm already doing the job anyway, so they might as well pay me for it. They didn't take me up on it, but like magic, the poaching went away.
How do you even make such a list nowadays? In the past tou could take stuff from forums, but now forums are dead. Facebook is trash so apart from known sources (is Rss dead?)organicly finding new stuff sounds harsh. Apart from maybe coppying from reddit / digg / wykop / some spanish reddit equivalent...
A few years ago I made a tool that fetches content from a big list of primary sources (via RSS or HTML), and pushes each link it finds through filters (keyword blacklist, duplicate check, etc). I made a UI that lets me accept or reject links Tinder style, and when I have a scrap of time to fill, I assess a few links.
I also have a small group of well-read friends who make an effort to send me stuff, that helps a lot too.
The current Digg is doing its best to destroy the goodwill they collected with the slightly-less-current Digg.
I started going there regularly about a year and a half ago, mostly because they would feature articles that I wouldn't find in my other typical websites. But in the last six months they have been optimizing to death, cutting anything that's not an instant success and publishing variant after variant of anything that's mildly successful (yay, another article on Twitter memes!). Clickbait titles are there too, along with a new-ish comment section that's 90% spam.
It's been a sobering lesson on what happens when you put growth above everything.
I hosted my first personal website and my Star Treck (sic!) fansite on Tripod. At some point in time they also had a data loss and one of both (along with a lot of other sites) was gone.
Tripod and Lycos were great for kids who were learning HTML and wanted to have their own website. I remember the annoying popups and then banner frames, and how I tried to copy and paste a lot of JS to remove them (so unfair for them, they were providing a free service!)
That happened to mp3.com as well - it was once a Bandcamp-like site, until it was acquired by CBS Interactive after a disastrous music locker service got it nuked from orbit by the RIAA. It's now, for all practical purposes, a parked domain.
I still have a CD-ROM from them (back then, everybody and his brother published a CD-ROM).
The political environment the site operates in turns hostile to website content or method of publishing. The operators face costs of compliance, loss of scope, or personal risk in continuing.
See: The UK Online Safety Bill [1] [2] and especially [3] "Ofcom's >1,500 page consultation on the Online Safety Act 2023, and why small companies don't have a chance"