Whenever people say they don't need backups because they are 'cloud based' I always wonder what they'll do when their precious cloud provider messes up. The chances of this happening to Amazon, Google or Microsoft are small but they're not 0, if it can happen to Cisco it likely could happen anywhere.
I don't need backups of my (S3) backup, because it's more likely my personal backup process has a flaw such that it will backfire and destroy my data, than that it will one day save it from the inadequacy of Dynamo's ~RAID17.
Consider: each time you introduce a new device that has local, physical access to the place your data lives, that's one more thing that could Halt and Catch Fire at just the wrong time, or be replaced with a USB Killer or a DMA cryptolocker device by social engineering. If it involves data center operators you don't know, that's more people you have to trust not to break whatever they touch or have been paid off to steal your corporate secrets. Etc.
Sure, the probabilities are small—but so is the probability of the great data fortresses crumbling to ash and you being the Last Best Hope for your data. Hypothetical ameliorations of sub-lightning-strike probabilities often have failure modes more likely than their use.
In that case I hope you have your S3 under a different account than your main stuff. There are more reasons why stuff goes missing than just hardware failure.
Note that a backup need not make things worse, but should only make things better.
Consider: each time you introduce a new device that has local, physical access to the place your data lives
Right. So don't do that. Put it somewhere else, and configure the original device to push to it rather than give the new device access to the original. You can use a service that implements the S3 API, then you don't even need to install new stuff on the original, just configure an extra endpoint. Also, encrypt before pushing (that counts for S3 too).
Do you think you will do a better job of backing up stuff compared to google, msft, etc.? They have dedicated engineers and spend lots of money on this stuff.
Think of it from a statistical perspective what is probability of you setting up this back up system vs them?
All of these have happened to companies that I have worked with, so no, I won't do a better job of backing stuff up comapred to google, msft, etc, BUT I would rather have some get-out-of-jail-free card if any of the above should happen and suddenly where there used to be data there is nothing.
You should approach this from a cost-benefits perspective, not from a skills perspective.
They may have better engineering, but they also have extra risks. My home server will never ban me because it thinks I've violated its TOS, for example.
I actually really doubt that Google, Amazon et al have proper backups of every client's storage - I've never come across details or even an idea of such a system. They just have enough redundancy and, more importantly, a "never-delete" architecture - data is merely tagged for deletion for a significant amount of time before it's ever deleted, and various systems check consistency on an ongoing basis.
Of course, even that doesn't prevent you from fucking up - your datastore will do exactly what you tell it to. Nobody can prevent you from doing the equivalent of rm -rf on your S3 store, or accidentally deleting the only copy of that movie your client's been working on for the last four years, and nothing can protect you from it except a decent backup.
Not sure about GCP, but Google certainly has back-ups for GMail. I was affected by an outage where only a few accounts (maybe millions but at least not a lot by Google standards) had emails deleted due to a software issue. They explained that recovery would take a few hours because data had to be restored from tape. At least that's the message they showed when I tried to login. Note that this was the free GMail product, no business support.
Even though there is some reward for expertise, backups are not difficult. What exactly does Big-4 bring to the backup table that none of us with Amanda, rsync, or BackupExec could do?
Cost of resources aside, a person could run hourly full-backups all day every day and have just as good a backup regime as a billion dollar company. Time-to-restore is something that the aforementioned expertise factors into, but a good backup is the linchpin, and can still be restored by whatever means.
Nobody said that you shouldn't have any data in the cloud. The argument is that you shouldn't have your data only in some cloud.
If you have your data in some cloud (either directly or as backup) as well as in your really crappy backup solution that has a 10% failure rate, you still are ten times less likely to loose your data than by just keeping your data in the cloud.
It's a good point, but I can't help but think of all the mistakes, disclosures, privacy violations, poor design, gratuitous change, etc. that has happened at the hands of "dedicated engineers." In this case specifically, are the engineers who caused this incident not dedicated and well paid?
Just to be clear, if you're using AWS, GCP, and Azure to host your own applications, you're at your own peril to managed disaster & recovery. Those companies make doing that much easier than managing your own DC and yes the reliability is going to be better than DIY (but still never zero). I think you mean more towards SaaS applications or anything that "phones home" data to back it up, right?
We're going to start seeing more business continuity audits of SaaS players, akin to a BBB rating for the company's ability to maintain service levels. I thought I came across a website that actually has started doing this, but I can't recall which it was.
I think it's more about shifting blame than actually providing more reliable services. I regularly see s3 throw an error when reading and writing tens of thousands of files (spark w/parquet). It's not that it's MORE reliable (although it is very reliable), it's just that when it isn't it's somebody else's problem and responsibility to fix it.
Centralization decreases frequency of failure but increases its cost. A really large scale nasty incident with a major cloud provider like Amazon could be a national state of emergency.
It really is only a matter of time. The problem with low frequency events is that you never have any idea how realistic your modeling is and the only way you'll find out is when that once in a 1000 year event happens tomorrow morning.
Agree, there was no data loss, but those who relied on US-Standard solely with no bit replication were dead in the water for the duration of the outage.
Sure, but that shows up in the risk calculations when you're choosing a cloud provider. I imagine for just about everyone it was cheaper to eat the loss on the day of the outage than to spend the time/effort/resources to do it right. Especially when it made national news that it was Amazon's fault so nobody blamed the sites that were down.
> Especially when it made national news that it was Amazon's fault so nobody blamed the sites that were down.
That's an interesting viewpoint. I really don't agree with it though. When your service is down that is your responsibility, never Amazon's. And when you lose data that is your responsibility too, not your cloud provider's.
S3 has had a data loss too sadly, a console UI bug lead to the wrong files getting deleted if I recall correctly. Yet another reason for frontend web tech to improve.
Not sure why you're citing some unrelated incident, I think you're jumping to conclusions that I'm talking about the one you cited, but I'm not. This one was not made public.
It was a bit surprising when I found out, but I think the really interesting part is that low quality web tech are the weak link in the chain. That was an eye opener.