Whenever people say they don't need backups because they are 'cloud based' I alw...

derefr · on Aug 6, 2017

I don't need backups of my (S3) backup, because it's more likely my personal backup process has a flaw such that it will backfire and destroy my data, than that it will one day save it from the inadequacy of Dynamo's ~RAID17.

Consider: each time you introduce a new device that has local, physical access to the place your data lives, that's one more thing that could Halt and Catch Fire at just the wrong time, or be replaced with a USB Killer or a DMA cryptolocker device by social engineering. If it involves data center operators you don't know, that's more people you have to trust not to break whatever they touch or have been paid off to steal your corporate secrets. Etc.

Sure, the probabilities are small—but so is the probability of the great data fortresses crumbling to ash and you being the Last Best Hope for your data. Hypothetical ameliorations of sub-lightning-strike probabilities often have failure modes more likely than their use.

jacquesm · on Aug 6, 2017

In that case I hope you have your S3 under a different account than your main stuff. There are more reasons why stuff goes missing than just hardware failure.

Note that a backup need not make things worse, but should only make things better.

icebraining · on Aug 6, 2017

Consider: each time you introduce a new device that has local, physical access to the place your data lives

Right. So don't do that. Put it somewhere else, and configure the original device to push to it rather than give the new device access to the original. You can use a service that implements the S3 API, then you don't even need to install new stuff on the original, just configure an extra endpoint. Also, encrypt before pushing (that counts for S3 too).

samfisher83 · on Aug 6, 2017

Do you think you will do a better job of backing up stuff compared to google, msft, etc.? They have dedicated engineers and spend lots of money on this stuff.

Think of it from a statistical perspective what is probability of you setting up this back up system vs them?

jacquesm · on Aug 6, 2017

Reasons why data can go missing:

- account compromised, wiped out

- operator error

- malicious employee

All of these have happened to companies that I have worked with, so no, I won't do a better job of backing stuff up comapred to google, msft, etc, BUT I would rather have some get-out-of-jail-free card if any of the above should happen and suddenly where there used to be data there is nothing.

You should approach this from a cost-benefits perspective, not from a skills perspective.

icebraining · on Aug 6, 2017

They may have better engineering, but they also have extra risks. My home server will never ban me because it thinks I've violated its TOS, for example.

toomuchtodo · on Aug 6, 2017

Nor will your own storage lock you out because you've annoyed a state actor, while a cloud provider will roll over.

vertex-four · on Aug 6, 2017

I actually really doubt that Google, Amazon et al have proper backups of every client's storage - I've never come across details or even an idea of such a system. They just have enough redundancy and, more importantly, a "never-delete" architecture - data is merely tagged for deletion for a significant amount of time before it's ever deleted, and various systems check consistency on an ongoing basis.

Of course, even that doesn't prevent you from fucking up - your datastore will do exactly what you tell it to. Nobody can prevent you from doing the equivalent of rm -rf on your S3 store, or accidentally deleting the only copy of that movie your client's been working on for the last four years, and nothing can protect you from it except a decent backup.

dx034 · on Aug 7, 2017

Not sure about GCP, but Google certainly has back-ups for GMail. I was affected by an outage where only a few accounts (maybe millions but at least not a lot by Google standards) had emails deleted due to a software issue. They explained that recovery would take a few hours because data had to be restored from tape. At least that's the message they showed when I tried to login. Note that this was the free GMail product, no business support.

rhizome · on Aug 6, 2017

Even though there is some reward for expertise, backups are not difficult. What exactly does Big-4 bring to the backup table that none of us with Amanda, rsync, or BackupExec could do?

Cost of resources aside, a person could run hourly full-backups all day every day and have just as good a backup regime as a billion dollar company. Time-to-restore is something that the aforementioned expertise factors into, but a good backup is the linchpin, and can still be restored by whatever means.

wongarsu · on Aug 6, 2017

Nobody said that you shouldn't have any data in the cloud. The argument is that you shouldn't have your data only in some cloud.

If you have your data in some cloud (either directly or as backup) as well as in your really crappy backup solution that has a 10% failure rate, you still are ten times less likely to loose your data than by just keeping your data in the cloud.

copperx · on Aug 6, 2017

It seems like storing your backups using two cloud providers is much more reliable than using just one cloud provider + local NAS.

jacquesm · on Aug 6, 2017

As long as - if this is a company - different people have access to those accounts, yes.

ams6110 · on Aug 6, 2017

It's a good point, but I can't help but think of all the mistakes, disclosures, privacy violations, poor design, gratuitous change, etc. that has happened at the hands of "dedicated engineers." In this case specifically, are the engineers who caused this incident not dedicated and well paid?

wbl · on Aug 6, 2017

Addition backups only add, not subtract, from reliability.

mbesto · on Aug 6, 2017

> Amazon, Google or Microsoft

Just to be clear, if you're using AWS, GCP, and Azure to host your own applications, you're at your own peril to managed disaster & recovery. Those companies make doing that much easier than managing your own DC and yes the reliability is going to be better than DIY (but still never zero). I think you mean more towards SaaS applications or anything that "phones home" data to back it up, right?

We're going to start seeing more business continuity audits of SaaS players, akin to a BBB rating for the company's ability to maintain service levels. I thought I came across a website that actually has started doing this, but I can't recall which it was.

tgtweak · on Aug 6, 2017

I think it's more about shifting blame than actually providing more reliable services. I regularly see s3 throw an error when reading and writing tens of thousands of files (spark w/parquet). It's not that it's MORE reliable (although it is very reliable), it's just that when it isn't it's somebody else's problem and responsibility to fix it.

top_coder · on Aug 6, 2017

One's cloud is someone else's hardware sitting on a basement.

api · on Aug 6, 2017

Centralization decreases frequency of failure but increases its cost. A really large scale nasty incident with a major cloud provider like Amazon could be a national state of emergency.

jacquesm · on Aug 6, 2017

It really is only a matter of time. The problem with low frequency events is that you never have any idea how realistic your modeling is and the only way you'll find out is when that once in a 1000 year event happens tomorrow morning.

inetknght · on Aug 6, 2017

> once in a 1000 year event happens tomorrow morning

It doesn't quite sound like a once-in-a-thousand-year event any more.

Rather, it sounds like once-in-a-thousand-year event for a single device, but divided by ten thousand devices means it happens ten times every year.

There's whole branches of statistics for failure rates.

jacquesm · on Aug 6, 2017

> It doesn't quite sound like a once-in-a-thousand-year event any more.

It still could be. Or do you expect such events to happen in year 500 only?

Spooky23 · on Aug 6, 2017

Perhaps... Until it happens again next week.

5706906c06c · on Aug 6, 2017

Remember the AWS S3 outage a few months back? There is a lesson to learn from that.

NoPiece · on Aug 6, 2017

Thankfully, they didn't lose/delete existing data during that outage.

5706906c06c · on Aug 6, 2017

Agree, there was no data loss, but those who relied on US-Standard solely with no bit replication were dead in the water for the duration of the outage.

Spivak · on Aug 6, 2017

Sure, but that shows up in the risk calculations when you're choosing a cloud provider. I imagine for just about everyone it was cheaper to eat the loss on the day of the outage than to spend the time/effort/resources to do it right. Especially when it made national news that it was Amazon's fault so nobody blamed the sites that were down.

jacquesm · on Aug 6, 2017

> Especially when it made national news that it was Amazon's fault so nobody blamed the sites that were down.

That's an interesting viewpoint. I really don't agree with it though. When your service is down that is your responsibility, never Amazon's. And when you lose data that is your responsibility too, not your cloud provider's.

Spivak · on Aug 6, 2017

What was the lesson? That no service has 100% uptime?

dx034 · on Aug 7, 2017

And that redundancy of data won't help you if the control server crashes.

org3432 · on Aug 6, 2017

S3 has had a data loss too sadly, a console UI bug lead to the wrong files getting deleted if I recall correctly. Yet another reason for frontend web tech to improve.

5706906c06c · on Aug 6, 2017

Factually incorrect, there was no data loss, availability was affected, however; https://aws.amazon.com/message/41926/

org3432 · on Aug 6, 2017

Not sure why you're citing some unrelated incident, I think you're jumping to conclusions that I'm talking about the one you cited, but I'm not. This one was not made public.

plandis · on Aug 6, 2017

When was that? Do you have the post mortem for it?

org3432 · on Aug 6, 2017

There's no public postmortem, it was in 2015 as I recall, it was handled internally and just with affected customers.

jacquesm · on Aug 6, 2017

There is some proof of this, comment by Scott Bonds, Mixbook:

https://www.quora.com/Has-Amazon-S3-ever-lost-data-permanent...

org3432 · on Aug 6, 2017

It was a bit surprising when I found out, but I think the really interesting part is that low quality web tech are the weak link in the chain. That was an eye opener.