Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Is it me or has there been a very noticeable uptick in large scale infra-level outages lately? AWS, Cloudflare, etc have all been way under whatever SLA they publish.




Coincidentally, large tech companies have been conducting mass layoffs and claim they're going to rely on AI much more to replace junior developers.

And they are offshoring roles to lower quality devs.

Interestingly, chatgpt was unavailable due to the same cloudflare outage.

Imagine vibe coding something in production, it breaks half the internet, then you can't vibe code it back because it broke the LLM providers. A real catch-22 for the modern age!

By similar thinking, you could blame large tech companies if they hired too many juniors.

Juniors, at least, have the capacity to learn.

That does seem to be a coincidence, as the recent outages making headlines (including this one according to early reports) have been associated with huge traffic spikes. It seems DDoS are reaching a new level.

AWS's most recent blow-up was not a DDoS

Maybe a laid-off engineer is bored and started orchestrating DDoS campaigns in their newly-found free time.

For me the only silver lining to all these cloud outages is now we know that their published SLA times mean absolutely nothing. The number of 9's used to at least give an indication of intent of reliability, now they are twisted to whatever metric the company wants to represent and dont actually represent guaranteed uptime anywhere.

So true. AWS for example gives only platform credits in the event of an outage. Basically no recourse or insurance.

Doesn’t everyone do that? I’ve never worked for a place that the base policy wasn’t credits. You might have special contract language stating otherwise, but for almost everyone, it’s credits.

Yeah everyone does that. Perhaps it was wrong to single out AWS! None of them should. In any other business, you get your money back.

Some of the other commenters here have posited a "vibe code theory". As the amount of vibe code in production increases, so does the number of bugs and, therefore, the number of outages.

None of the recent major outages were traced down to "vibe coding" or anything of the sort. They appear to be the kind of misconfigurations and networking fuckups that existed since Internet became more complex than 3 routers.

The "vibe thinking" trend where people stop using their brain and rely on whatever random output the LLM tells them is harder to diagnose, but it's certainly there and at least as bad as vibe coding.

What about the “vibe thinking” trend where people project their own narratives on to every situation, even if the information available shows that it’s a rise in large scale DDoS attacks?

Unfortunately, not a trend. Just human nature. I hope they'll find a fix for that one day.

How likely are we to know when a "misconfiguration or networking fuckup" is due to someone asking ChatGPT how to do the task?

>misconfigurations and networking fuckups that existed since Internet became more complex than 3 routers.

Yet there has been an uptick in frequency of outages only in the recent few months. Correlation correlation.

Why assume that these misconfigs are not the result of someone asking AI how to do them?


Is it a statistically significant uptick though? Random events doesn't mean equally spaced, sometimes there will be more, sometimes there will be less

Wasn't the recent AWS a race condition that's existed since before vibe coding was a thing?

Speaking of "vibe-coding", I wonder how much their own outage is affecting their ability to vibe-code their way out of it.. :-)

The openai login page says:

    Please unblock challenges.cloudflare.com to proceed.

> Some of the other commenters here have posited a "vibe code theory". As the amount of vibe code in production increases, so does the number of bugs and, therefore, the number of outages.

Likely this coupled with the mass brain damage caused by never-ending COVID re-infections.

Since vaccines don't prevent transmission, and each re-infection increases the chances of long COVID complications, the only real protection right now is wearing a proper respirator everywhere you go, and basically nobody is doing that anymore.


Are you being hyperbolic? It's clearly not this, and very likely not GP's proposal either.

No, and it's easy to find ample research backing it up.

Agreed.

Most people are not self reflective reflective enough to notice. Need to trust the studies.

Far more plausible than the AI ideas.

I find it far more likely these are smart people running without oversight for years pre-COVID, relying on being smart at 2am change windows. Now half or a full std. dev. lower on the IQ scale, hubris means fewer guard rails before change, and far lower ability to recover during change window.


Exactly. The effects can include both "brain fog" as well as impaired judgement, since the brain areas affected have to do with executive function.

We can even see (measure) it in driving behavior patterns.

Another data point is how Hollywood has gone to great lengths to keep the whole thing hush hush, because such a downer is bad for business:

https://old.reddit.com/r/ZeroCovidCommunity/comments/1ncmclw...


I somehow doubt that if a majority of the population was a full standard deviation lower on the IQ scale that nobody would be talking about it, that there wouldn't be more research, and that it wouldn't be covered by the news whatsoever.

Keep in mind many parties benefit from capitalizing on hysterical hype. And you're going to sit here and tell me they're all keeping it covered up for some reason?

This comes off like the extreme left-wing equivalent of extreme right-wingers that say the "world is run by the evil jewish cabal" and when people ask for proof, they retort "everyone is hiding it so none exists".


I agree with you about the IQ thing but am not seeing how this is a left-wing thing.

For a select few, maybe, but it's obviously not enough of the population to make a significant difference in downtime outages in large tech corporations.

It's far more likely due to either AI, or more directly, layoffs and offshoring, as that affects hundreds of thousands of their employees.


I have become dumber without having contracted covid or other respiratory diseases (which could have been covid). 2020s have been the era of fascism, war and communities getting torn, which does not really help with stress levels and intellectual performance.

The theory I’ve heard is holiday deploy freezes coupled with Q4 goals creates pressure to get things in quickly and early. It’s all been in the last month or so which does line up.

What's different about this Q4 vs the last 20 years of Q4s?

The obvious answer is to cancel holidays.

My theory is a state-sponsored actor targeting some of these services, but maybe that's just too 'tinfoil hat' of me, who knows.

There are usually very comprehensive post mortems for these events, and none have suggested that at all

This only amplifies the often-repeated propaganda about the "very powerful" enemies of democracy, who in fact are very fragile dictatorships. There's enough incompetence at tech companies to f up their own stuff.

My theory is DNS.

Somewhere, at a floating desk behind a wall of lava lamps, in a nyancatified ghostty terminal with 32 different shader plugins installed:

You're absolutely right! I shouldn't have force pushed that change to master. Let me try and roll it back. * Confrobulating* Oh no! Cloudflare appears to be down and I cannot revert the change. Why don't you go make a cup of coffee until that comes back. This code is production ready, it's probably just a blip.


If it's any guidance, US cyber risk insurance (which covers among other things disruptions due to supplier outages) has continuously dropped in price since Q1 2023, with a handful of percent per year.

If you excuse the sloppy plot manually transcribed from market index data: https://i.xkqr.org/cyberinsurancecost.png


Don't forget Azure Front Door / half of Azure.

Yeah, but that's just standard for Azure.

I suspect the number of outages is the same, but the number of sites putting all of their eggs into these two baskets has grown considerably.

Unless you're making that determination statistically, it's probably pereidolia. See here: https://behavioralscientist.org/yates-expect-unexpected-why-...

It's you. Everything does down once in a while.

GCP was down recently as well

Well AWS runs on Cloudflare...so thanks Cloudflare team!

Any chance our friend Vladamir is behind this?

it definitely feels like it.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: