AWS is having widespread issues

i_cant_speel · on July 27, 2017

I've been working at my first software dev job for a few months now. I sat down at work today and, for the first time, I had to launch and configure an EC2 instance. Of course, within the first few minutes of getting started AWS starts having issues.

windowshopping · on July 27, 2017

Great, you broke it.

i_cant_speel · on July 27, 2017

It's called manual testing and I clearly did my job.

windowshopping · on July 27, 2017

true_religion · on July 27, 2017

It's his first job, he doesn't have the experience necessary to instinctively find this funny.

i_cant_speel · on July 27, 2017

I was joking as well :)

windowshopping · on July 27, 2017

Ohhhh I lose

joeskyyy · on July 27, 2017

Thank you for a good laugh to start my day haha

GabrielBen · on July 27, 2017

Another classical example of a junior dev breaking the production build. When will the CTO's learn.

sakuraiben · on July 27, 2017

It's intern season

TheRealWatson · on July 27, 2017

[flagged]

sctb · on July 27, 2017

It's absolutely not OK to make personal attacks like this on Hacker News.

https://news.ycombinator.com/newsguidelines.html

bryanh · on July 27, 2017

At Zapier we saw half the internet on AWS blip out for a bit (us too), but it seems to have been short lived. Approximately Jul 27, 2017 13:47:45 to Jul 27, 2017 13:59:33 (UTC) as far as we could tell.

ending · on July 27, 2017

From our EC2 dashboard in us-east-1:

[RESOLVED] Network Connectivity 07:28 AM PDT Between 6:47 AM and 7:10 AM PDT we experienced increased launch failures for EC2 Instances, degraded EBS volume performance and connectivity issues for some instances in a single Availability Zone in the US-EAST-1 Region.

edit: looks like this message is now on the status page

holydude · on July 27, 2017

This is why I am really scared about companies owning too much market share. I mean literally who is not running or using anything that runs on AWS ?

chimeracoder · on July 27, 2017

> I mean literally who is not running or using anything that runs on AWS ?

Google and Microsoft both run their own equivalents to AWS (Google Cloud and Azure, respectively).

They don't have as much market share as AWS does, but they're a lot larger than you might expect.

pc86 · on July 27, 2017

Every company I've ever worked for, for one.

There are huge swaths of the internet not affected by AWS, just as there are (other) huge swaths not affected by Google or Azure.

micael_dias · on July 27, 2017

Google

sly010 · on July 27, 2017

https://xkcd.com/908/

snewman · on July 27, 2017

There was definitely an issue. Around 25% of our servers in one availability zone of us-east-1 fell off the network for 15 minutes or so, starting around 13:47 GMT. They're back now.

During this time period, we were also unable to access the console (500 errors).

Bartweiss · on July 27, 2017

Word from Amazon is "elevated packet loss", but I saw pretty much the same. Elevated to 100%, I guess.

tapoxi · on July 27, 2017

Why does it always seem to be us-east?

unquietcode · on July 27, 2017

East 1 is indeed the oldest and has the most non-standard configuration of the bunch (besides China, of course). I definitely would recommend east 2 or west 2 for any new deployments.

sjg · on July 27, 2017

I'd guess it's the oldest data centre so more prone to failures, but that's pure speculation.

avenoir · on July 27, 2017

Isn't it the cheapest region to use? Probably sees more use because of it.

Johnny555 · on July 27, 2017

us-west-2 (Oregon) and us-east-2 (Ohio) are the same price as us-east-1 (Virginia). At least that's true for most resources, I didn't check the full price list.

I don't know about Ohio since I don't use it, but we've had far fewer problems in us-west-2 than in us-east-1

endersshadow · on July 27, 2017

If I have a single-region service, I always put it in us-west-2. It's super reliable, and gets updates after us-east-1 and us-west-1, which means all the kinks are out before they hit us-west-2.

On days like today, I without fail get a message from my friend who works at a shop where everything is in us-east-1 (multi-AZs) about how much he hates me for avoiding east like the plague.

edgesrazor · on July 27, 2017

Which AZ?

snewman · on July 27, 2017

"C", but that's meaningless because AWS scrambles the zone names for each account. (Presumably to prevent everyone from putting all their servers in "A".)

edgesrazor · on July 27, 2017

Interesting - I had no idea they did that. Well, I guess checking my instances in C won't be any help to you. Sorry!

mewm · on July 27, 2017

FYI, it's elaborated here: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-re...

1001101 · on July 27, 2017

Haha - I didn't know that. Makes sense. I've got a dropdown in one of my Cloudformation scripts for AZs, and every time I get to it, I spend way more time thinking about it than I should. You've saved me some time.

CoffeeDregs · on July 27, 2017

From my dashboard: EC2 VPC network health intra AZ issue

The issue that began at Thu, 27 Jul 2017 13:53:00 GMT has been resolved and the service is operating normally.

Start time July 27, 2017 at 9:53:00 AM UTC-4 End time July 27, 2017 at 10:08:00 AM UTC-4

forrestbrazeal · on July 27, 2017

Now getting Lambda provisioning errors in us-east-1:

LAMBDA_FAILED: ServiceException: We currently do not have sufficient capacity in the region you requested. Our system will be working on provisioning additional capacity. You can avoid getting this error by temporarily reducing your request rate.

I wonder if they had to take part of their fleet offline due to the issues

ramshanker · on July 27, 2017

Here comes the rarest opportunity of a live AWS outage postmortom. Wait... it should be called present-mortom.

mkempe · on July 27, 2017

It's mortem, a latin word (accusative singular of "mors"). I can't think of any latin declension or any latin word that would have ended in "om".

KyeRussell · on July 29, 2017

RomanPushkin · on July 27, 2017

https://github.com/ro31337/awesome-aws-alternatives

kennydude · on July 27, 2017

Services mentioned here don't require a manual to read before use and don't come with a list of quirks. Good list

emidln · on July 27, 2017

Everything requires a manual and has a list of quirks if you're doing something non-trivial or high volume. Everything has trade-offs, the only question is how much you get to know upfront.

Ascetik · on July 27, 2017

Yeah setting up AWS instances is a PITA. It still confuses me when I look at it.

ceejayoz · on July 27, 2017

When AWS is having widespread issues half the internet seems to stop working. This looks like a 500 error on the console.

Is there any actual indication of AWS issues beyond one random person's tweet?

edit: Ah, https://twitter.com/ylastic just went on a retweeting tear. Looks like us-east-1?

piquadrat · on July 27, 2017

Can't find a permalink, but I had this notification in our AWS console:

> Beginning at Thu, 27 Jul 2017 13:53:00 GMT, some instances are experiencing elevated packet loss in the us-east-1a Availability Zone. We are now investigating this issue.

Some of our instances weren't reachable for about 10 minutes.

tbcj · on July 27, 2017

At my company we are seeing issues in us-east-1 involving KMS and EC2.

Aqueous · on July 27, 2017

Interesting - I use both those services in us-east-1 and have not experienced issues. https://status.aws.amazon.com/ also shows a sea of green, although I'm not sure this page is even functional, because even when I know AWS has been having issues it's a sea of green.

ceejayoz · on July 27, 2017

In the last major AWS outage, it stayed green because updating it depended on some services affected by the outage.

Even when they can update it, it seems to be a manual process.

Bartweiss · on July 27, 2017

I think they've scrapped the AWS dependencies there, which were awfully silly. But it doesn't really seem to update regardless, and when it does it's a cute little 'i' on the green checkmark to inform you that everything is fine except for the 'actually working' part.

edgesrazor · on July 27, 2017

After nearly 10 years working with AWS, I've learned to never trust that page.

iUsedToCode · on July 27, 2017

Right now it seems that my $1/month shared hosting has less downtime than AWS this year.

KyeRussell · on July 29, 2017

If AWS was running "$1/month shared hosting" I'm sure they'd have better uptime too. Apple and oranges.

ufmace · on July 27, 2017

We're heavily dependent on AWS, and haven't seen any issues yet today.

i_cant_speel · on July 27, 2017

It seems like there are only issues on US-EAST-1

smcleod · on July 27, 2017

If there's "elevated packet loss" to/from EBS, which is your disk - does that mean that people had to rebuild or redeploy instances using EBS storage?

65827 · on July 27, 2017

I wonder how many people die because of their smart homes being dependent on some network thing this time around. I'm 80% kidding, I think?

celim307 · on July 27, 2017

If you have critical systems completely reliant on the cloud, then that's just darwin awards in action

macintux · on July 27, 2017

The problem is that the people dying aren't going to be the people who implemented the solution.

marcosdumay · on July 27, 2017

They are going to be the people that brought them. It's still their decision.

That said, we have governments for a reason.

c17r · on July 27, 2017

https://webcache.googleusercontent.com/search?q=cache:DCdzsn...

celim307 · on July 28, 2017

holy shit. I really really really hope noone died due to that person's colossal stupidity.

hoschicz · on July 27, 2017

Only one availability zone is down.

sidlls · on July 27, 2017

I have a hard time having sympathy for someone who puts together something that critical with infrastructure meant primarily to sell to modern day gold diggers disguised as technologists.

jjfine · on July 27, 2017

We're seeing high error rates writing to Kinesis

scierama · on July 27, 2017

I have external scripts monitoring my Lightsail instance, there was no downtime for Lightsail.

Edit: The instance is in Ohio.

twunde · on July 27, 2017

Jira is now down too. Not bitbucket though!

lugg · on July 27, 2017

If only it would stay down.