why the fuck is our critical infrastructure running on WINDOWS. Fuck the sad sta... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		xyst on July 19, 2024 \| parent \| context \| favorite \| on: CrowdStrike Update: Windows Bluescreen and Boot Lo... why the fuck is our critical infrastructure running on WINDOWS. Fuck the sad state of IT. CIOs and CTOs across the board need to be fired and held accountable for their shitty decisions in these industries. yes CRWD is a shitty company but seems they are a "necessity" by some stupid audit/regulatory board that oversees these industries. But at the end of the day, these CIOs/CTOs are completely fucking clueless as to the exact functions this software does on a regular basis. A few minions might raise an issue but they stupidly ignore them because "rEgUlAtOrY aUdIt rEqUiReS iT!1!"

lnrd on July 19, 2024 | [–]

The OS doesn't matter, the question should be why is critical infrastructure online and allowed to receive OTA updates from third parties.

lambda on July 19, 2024 | | [–]

While Linux isn't a panacea, the OS does matter as Linux provides tools for security scanners like Crowdstrike to operate entirely in userspace, with just a sandboxed eBPF program performing the filtering and blocking within the kernel. And yes, CrowdStrike supports this mode of operation, which I'll be advocating we switch over to on Monday. So yeah, for this specific issue, Linux provides a specific feature that would have prevented this issue.

CaliforniaKarl on July 19, 2024 | | | [–]

BPF-based CrowdStrike is relatively recent, partially because, from the Enterprise Linux perspective, kernel support is relatively recent.

For example, BPF-based CrowdStrike works on Enterprise Linux 9 and Debian 12. I don't know if the necessary support was in EL 8 or Debian 11.

taftster on July 19, 2024 | | | | [–]

Right! Windows should NEVER blue screen. Ever. From a third-party software.

Maybe Windows doesn't provide the right ABI or whatever for CS, but come on, you should never be able to kernel panic Windows.

That this blue screened is 100% Microsoft's fault. It's a mess all the way around.

bogdan on July 20, 2024 | | | [–]

Poe's law?

notfed on July 20, 2024 | | | | [–]

I mean, you can crash Linux too with bad kernel code.

tivert on July 19, 2024 | | | [–]

> The OS doesn't matter, the question should be why is critical infrastructure online and allowed to receive OTA updates from third parties.

Not exactly. I think the question is why is critical infrastructure getting OTA updates from third parties automatically deployed directly to PROD without any testing.

These updates need to go to a staging environment first, get vetted, and only then go to PROD. Another upside of that it won't go to PROD everywhere all at once, resulting in such a worldwide shitshow.

pertymcpert on July 20, 2024 | | | [–]

I think you have the priority backwards. We shouldn’t be relying on trusting the QA process of a private company for national security systems. Our systems should have been resilient in the face of Crowdstrike incompetence.

tivert on July 22, 2024 | | | [–]

> I think you have the priority backwards. We shouldn’t be relying on trusting the QA process of a private company for national security systems. Our systems should have been resilient in the face of Crowdstrike incompetence.

I think you misunderstood me. I wasn't talking about Crowdstrike having a staging environment, I was talking about their customers. So 911 doesn't go down immediately once Crowdstrike pushes a bad update, because the 911 center administrator stages the update, sees that it's bad, and refuses to push it to PROD.

I think that would even provide some resiliency in the fact of incompetent system administrators, because even if they just hit "install" on every update, they'll tend to do it at different times of day, which will slow the rollout of bad updates and limit their impact. And the incompetent admin might not hit "install" because he read the news that day.

gtvwill on July 19, 2024 | | | | [–]

Lol if they can't do staging to mitigate balls ups on the high availability infrastructure side (optus in aus earlier this year pushed a router config that took down 000 emergency for a good chunk of the nation) we got bugger all hope of big companies getting it further up the stack in software.

worik on July 19, 2024 | | | | [–]

> why is critical infrastructure getting OTA updates from third parties automatically deployed directly to PROD without any testing.

I am missing some details, perhaps

From what I see this was an update from Crowdstrike. They are a first party, no?

Was another party involved?

kortilla on July 19, 2024 | | | [–]

They are a third party software provider

killerstorm on July 19, 2024 | | | [–]

OS absolutely does matter. Windows has an enormous attack surface because Microsoft doesn't care.

There's a number of minimal operating systems without all bells and whistles. The reason they aren't as popular choice is the "OS doesn't matter".

If OS is minimal it doesn't need OTA updates, let alone from a third party...

nequo on July 19, 2024 | | | [–]

In this case it wasn’t an update to the OS but an update to something running on the OS supplied by an unrelated vendor.

But if we entertain the idea that another OS would not need CrowdStrike or anything else that required updates to begin with, I have doubts. Even your CPU needs microcode updates nowadays.

cpill on July 19, 2024 | | | [–]

Of course the OS matters! Windows is a nasty ball of patches in order to maintain backward compatibility with the 80s. Linux and OSX don't have to maintain all the nasty hacks to keep this backward compatibility.

Also, Crowdstrike is a security (patch) company because Windows security sucks to the point they have, by default, real-time virus protection running constantly (runs my CPU white hot for half the day, can you imagine the global impact on the environment?!).

It's so bad on security that its given birth to a whole industry to fix it i.e. Crowdstrike. Every time I pass a bluescreen in a train station or advertisement I'd like "hA! you deserve that for choosing Windows".

rbanffy on July 19, 2024 | | | [–]

IBM’s z/OS maintains compatibility with the 60’s, and machines running it continued to process billions of transactions every second without taking a break.

The OS matters, as well as the ecosystem and, and this is most important, the developer and operations culture around it.

cyberax on July 19, 2024 | | | | [–]

> Of course the OS matters! Windows is a nasty ball of patches in order to maintain backward compatibility with the 80s. Linux and OSX don't have to maintain all the nasty hacks to keep this backward compatibility.

Just don't tell that to Linus Torvalds :) Because Linux absolutely does maintain compatibility with old ABI from 90-s.

rbanffy on July 19, 2024 | | | [–]

> Just don't tell that to Linus Torvalds :) Because Linux absolutely does maintain compatibility with old ABI from 90-s.

That’s nothing. IBM’s z/OS maintains compatibility with systems dating all the way back to the 60’s. If they want to think they are reading a stack of punch cards, the OS is happy to fool them.

bg24 on July 19, 2024 | | | [–]

+1

With so much of tooling and products, it should come down to

- What am I running and their current security state

- Supply chain of any change that's happening

- Test/Stage/Rollout any change - do not trust the vendor, as they do not know your infrastructure

By allowing OTA update, they assumed that the vendor has tested all permutations.

aylons on July 19, 2024 | | | [–]

It matters as in it makes it easy for this kind of issue to cause this much damage with little to no recourse for a fast correction.

Not that Linux or whatever are all immune, but it definitely matters.

meibo on July 19, 2024 | | | [–]

You should look into what a kernel driver is. You can panic a Linux kernel with 2 lines of code just as you can panic a Windows kernel, they just got lucky that this fault didn't occur in their Linux version.

And to be honest, I don't think recovering from this would be that much easier for non-technical folk on a fully encrypted Linux machine, not that it's particularly hard on Windows, it's just a lot of machines to do it on.

rbanffy on July 19, 2024 | | | [–]

In Linux it could be implemented as an eBPF thing while most of the app runs in userspace.

And, for specialised uses, such as airline or ER systems, a cut-down specialised kernel with a minimal userland would not require the kind of protection Crowdstrike provides.

I’m sure the NSA wasn’t affected by this.

ashayh on July 19, 2024 | | | [–]

ebpf works in Windows as well.

teej on July 19, 2024 | | | [–]

The OS absolutely matters

marcosdumay on July 19, 2024 | | | [–]

The culture around the OS matters.

But this is a 3rd party software with ring-0 access to all of your computers deciding to break them. The technical features of the OS absolutely do not matter.

rbanffy on July 19, 2024 | | | [–]

The question is whether other OSs would require it to have kernel mode privileges. People run complicated stuff in kernel mode for performance, because the switch to/from userspace is expensive.

Guess what’s also expensive? A global outage is expensive. Much more than taking the performance hit a better, more isolated, design would avoid.

marcosdumay on July 20, 2024 | | | [–]

EDS run in kernel mode for access, not performance. They monkey-patch your syscalls.

sixothree on July 19, 2024 | | | | [–]

The alternatives aren't in a position fill the roles needed for the tasks at hand.

jeremycarter on July 19, 2024 | | | [–]

This is true. Linux large fleet management is still missing some features large enterprises demand. Do they need all those features, idk, but they demand them if they're switching from Windows.

rbanffy on July 19, 2024 | | | | [–]

What are the tasks in question?

xvector on July 19, 2024 | | | | [–]

No, what is stopping a similarly designed EDR from causing the same problem on Linux?

lallysingh on July 19, 2024 | | | [–]

From a comment above, Linux has features (ebpf) that key crowdstrike stay out of the kernel.

The old "everyone else is just as bad" adage is bullshit. Some OSs are better suited than others.

fragmede on July 19, 2024 | | | [–]

From a comment elsewhere, a CS update took out Linux machines earlier this year.

Saris on July 19, 2024 | | [–]

Didn't CRWD cause a similar issue with Debian/RHEL a little while ago?

It sounds to me that the problem lies with CRWD and not with whatever OS it's installed on.

rbanffy on July 19, 2024 | | [–]

A kernel driver can, definitely, take down a Linux machine.

The question is whether someone should implement something like this as a kernel module when there are better ways.

okanat on July 19, 2024 | | | [–]

Windows also has better ways such as filter drivers and hooks. If everybody used Linux, Crowd Strike would still opt for the kernel driver since the software they create is effectively spyware that wants access to stuff as deep as possible.

If they opted for an eBPF service but put that into early boot chain, the bootloop or getting stuck could still happen.

The only long time solution is to stop buying software from a company that has a track record of being pushy and having terrible software practices like rolling out updates to the entire field.

aenis on July 19, 2024 | | | [–]

I think the only real solution is for MSFT to stop allowing kernel level drivers, as Apple has already (sorf of, but nearly) done. Sure, lots and lots of crap runs on windows in kernelspace, but what happened today cost a sizable fraction of world's GDP. There won't be a better wake up call.

rbanffy on July 19, 2024 | | | [–]

I hope that, in the future, we have better robot firmware validation protocols in place when pushing OTA updates.

Maybe Skynet didn't mean any of that - it was just a botched update.

__MatrixMan__ on July 20, 2024 | | | | [–]

But would the Linux sysadmins of the world play along in the way that the Windows sysadmins of the world did? I think they might've given Crowd Strike the finger and confined them to a smaller blast radius anyhow. And if they wouldn't have... well they will now.

rbanffy on July 20, 2024 | | | [–]

Third-party blobs running in kernel space being delivered through their own channels without anyone in the company signing them off?

I don’t think I ever met a Unix person with whom that idea would fly.

okanat on July 20, 2024 | | | [–]

Once it gets popular, I think it would happen. The business people and C-suite would request quick dirty solutions like Crowd Strike's offerings to check boxes when entering new markets and go around the red tape. So they'll force Unix people to do as they say or else.

__MatrixMan__ on July 20, 2024 | | | | [–]

Agreed. It's a safer culture because it grew up in the wild. Windows, by contrast, is for when everybody you're using it with has the same boss... places where sanity can be imposed by fiat.

If Microsoft is to be blamed here, it's not for the quality of their software, it's for fostering a culture where dangerous practices are deemed acceptable.

rbanffy on July 19, 2024 | | | | [–]

> If they opted for an eBPF service but put that into early boot chain, the bootloop or getting stuck could still happen.

If the in-kernel part is simple and passes data to a trusted userland application the likelyhood of a major outage like the one we saw is much reduced.

luxuryballs on July 19, 2024 | | [–]

More specifically why is critical stuff not equipped properly to revert itself and keep working and/or fail over? This should be built-in stuff at this point, have the last working OS snapshot on its own storage chip and automatically flash it back, even if it takes a physical switch… things like this just shouldn’t happen.

afavour on July 19, 2024 | | [–]

> why the fuck is our critical infrastructure running on WINDOWS

Because it’s cheaper.

I feel like many in this thread are obsessing over the choice of OS when the actual core question is why, given the insane money we spend on healthcare, are all healthcare systems shitty and underinvested?

A sensible, well constructed system would have fallbacks, no matter if the OS of choice is Windows or Linux.

rbanffy on July 19, 2024 | | [–]

The difference is that lots of different companies can share the burden of implementing all that in Linux (or BSD, or anything else) while only Microsoft can implement that functionality in Windows and even their resources are limited.

afavour on July 19, 2024 | | | [–]

Very little healthcare functionality would ever need to be created at the OS level. The burden could be shared no matter if machines were running Windows or Linux, they’re mostly just regular applications.

rbanffy on July 19, 2024 | | | [–]

Not talking about the applications - those could be ported and, ideally, financed by something like the UNDP so that the same tools are available everywhere to any interested part.

I'm talking about Crowdstrike's Falcon-like monitoring. It exists to intercept "suspicious" activity by userland applications and/or other kernel modules.

freeopinion on July 19, 2024 | | | [–]

Cheaper? Well, perhaps when you require your OS to have some sort of support contract. And your support vendor charges you unhealthy sums.

And then you get to see the value of the millions of dollars you've paid for support contracts that don't protect your systems at all. But those contracts do protect specific employees. When the sky falls down, the big money execs don't have a solution. But it's not their fault because the support experts they pay huge sums don't have solutions either. Somehow paying millions of dollars to support contractors that can't save you is not seen as a fireable offense. Instead it is a career-saving scapegoat.

Within companies that have been bitten this time, the team that wasn't affected because they made better process decisions will not be promoted as smarter. Their voice will continue to be marginalized by the people whose decisions led to this disaster. Because, hey, look, everyone got bit right? Nobody looks around to notice the people who were not bitten and recognize their better choices. And "I told you so" is a pretty bad look right now.

VHRanger on July 20, 2024 | | | [–]

> I feel like many in this thread are obsessing over the choice of OS when the actual core question is why, given the insane money we spend on healthcare, are all healthcare systems shitty and underinvested?

Because it's basically impossible to compete in the space.

Epic is a pile or horseshit, but you try convincing a hospital to sign up to your better version.

zjaffee on July 19, 2024 | | [–]

Tons of critical infrastructure in the US is run on IBM zOS. It doesn't matter what operating system you use, what matters is updates aren't automatic and everything is as air gapped as possible.

worik on July 19, 2024 | | [–]

> why the fuck is our critical infrastructure running on WINDOWS.

That hits the nail on the head.

But it is a rhetorical question. We know why, generally, software sacks, and specifically why Windows is the worst and is the most popular

Good software is developed by pointy headed needs (like us) and successful software is marketed to business executives are have serious pathologies

There are exceptions (I am struggling to think of one) where a serious piece of good software has survived being mass marketed, but the constraints (basically business and science) conflict

aenis on July 19, 2024 | | [–]

Nah, nope.

1/ linux is as vulnerable to kernel panics induced by such software. In fact, CS had a similar snafu mid April, affecting linux kernels. Luckily, there are far fewer moronic companies running CS on linux boxes at scale.

2/ it does offer protection - if you are running total shit architecture and you need to trust your endpoints not to be compromised, something like this is sadly a must.

Incidentally, google, which prides itself at running a zero-trust architecture, sent a lot of people home on Friday. Not so zero-trust after all, it seems.

Lots of armchair CIOs/CTOs in the comments today.

pertymcpert on July 20, 2024 | | [–]

Source on google sending home people?

mvdtnz on July 19, 2024 | | [–]

Windows is no more or less vulnerable to this class of issues than any other OS.

evilmwnci on July 19, 2024 | | [–]

Debatable. macOS did away with third-party kernel extensions. On Windows, CS runs in the kernel, and the kernel can't load properly because of CS.

dblohm7 on July 19, 2024 | | | [–]

Apple also 100% controls their hardware, so they can afford to do away with third-party kernel extensions.

npunt on July 19, 2024 | | | [–]

maybe that's what's required for critical infrastructure

fortran77 on July 19, 2024 | | [–]

Windows isn’t Crowdstrike.

cpill on July 19, 2024 | | [–]

No, its just soooooo bad at security/stability that it gave birth to Crowdstrike. They very fact that Crowdstrike is so big and prevalent means is proof of the gapping hole in Windows security. Its given birth to a multibillion dollar industry!

foobarchu on July 19, 2024 | | | [–]

Crowdstrike/falcon use is not by any means limited to Windows. Plenty of Linux heavy companies mandate it on all infrastructure (although I hope that changes after this incident).

rbanffy on July 19, 2024 | | | [–]

It’s mandated because someone believes Linux is as bad as Windows in that regard.

And, quite frankly, a well configured and properly locked down Windows would be as secure as a locked down Linux install. It’d also be a pain to use, but that’s a different question.

Critical systems should run a limited set of applications precisely to reduce attack surface.

sbuk on July 19, 2024 | | | | [–]

The reality is the wetwear that interfaces with any OS is always going to be the weakest link. Doesn't matter what OS they run, I guarantee they will click links and download files from anywhere.

sudosysgen on July 20, 2024 | | | [–]

I can pretty easily make it so a user on Linux can't download executables and can't even then can't do any damage without a severe vulnerability. That is actually pretty difficult to do in a typical Windows AD deployment. There is a big difference between the two OSes.

In fact, there's a couple billion Linux devices running around locked down hard enough that the most clueless users you can imagine don't get their bank details stolen.

fortran77 on July 19, 2024 | | | | [–]

There’s Crowdstrike for Linux and Mac

supergirl on July 19, 2024 | | | | [–]

spoiler: crowdstrike is used by companies running on mac and linux as well

ar_lan on July 19, 2024 | | [–]

Odd that you choose Windows to swipe at when this was largely CRWD's problem + a mix of awful due diligence by IT departments.

heurist on July 19, 2024 | | [–]

The primary answer to your question is because it's expensive to switch.

marcosdumay on July 19, 2024 | [–]

> yes CRWD is a shitty company but seems they are a "necessity" by some stupid audit/regulatory board that oversees these industries.

Yep, this is the problem. The part about Windows is a distraction here.

That bullshit regulation is a much larger security issue than Windows. Incomparably so. If you run it over Linux, you'll get basically the same lack of security.

Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10
Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact