Due to the scale I think it’s reasonable to state that in all likelihood many pe...

zzyzxd · on July 19, 2024

At one company I used to work for, we had boring, airgapped systems that just worked all the time, until one day security team demanded that we must install this endpoint security software. Usually, they would fight tooth and nail to prevent devs from giving any in-house program any network access, but they didn't even blink once to give internet access to those airgapped systems because CrowdStrike agents need to talk to their mothership in AWS. It's all good, it's for better security!

It never caught any legit threat, but constantly flagged our own code. Our devs talked to security every other week to explain why this new line of code is not a threat. It generated a lot of work and security team's headcount just exploded. The software checked a lot of security checkboxes, and our CISO can sleep better at night, so I guess end of day it's all worth it.

seniorThrowaway · on July 19, 2024

>It never caught any legit threat, but constantly flagged our own code

When I worked in large enterprise it got to the point that if a piece of my app infrastructure started acting weird the blackbox security agents on the machines were the first thing I suspected. Can't tell you how many times they've blocked legit traffic or blown up a host by failing to install an update or logging it to death. Best part is when I would reach out to the teams responsible for the agents they would always blame us, saying we didn't update, or weren't managing logs etc. Mind you these agents were not installed or managed by us in any way, were supposed to auto update, and nothing else on the system outran the logrotate utility. Large enterprise IT security is all about checking boxes and generating paperwork and jobs. Most of the people I've interacted with on it have never even logged into a system or cloud console. By the end I took to openly calling them the compliance team instead of the security team.

cjbgkagh · on July 19, 2024

I know I've lost tenders due to not using a pre-approved anti-virus vendors which really does suck and has impinged the growth of my company, but since I'm responsible for the security it helps me sleep at night. This morning I woke up to a bunch of emails and texts asking me if my systems have been impacted by this and it was nice to be able to confidently write back that we're completely unaffected.

I day-dream about being able to use immutable unikernels running on hypervisors so that even if something was to get past a gateway there would be no way to modify the system to work in a way that was not intended.

Air-gapping with a super locked down gateway was already getting more popular precisely due to the forced updates threat surface area, and after today I expect it to be even more popular. At the very least I’ll be able to point to this instance when explaining the rational behind the architecture which could help in getting exemptions from the antivirus box ticking exercise.

Salgat · on July 19, 2024

I love their forced updates, because if you know what you're doing you can disable them, and if you don't know what you're doing, well you shouldn't be disabling updates to begin with. I think people forget how virus infested and bug addled Windows used to be before they enforced updates. People wouldn't update for years and then bitch how bad Windows was, when obviously the issue wasn't Windows at that point.

__MatrixMan__ · on July 19, 2024

If the user wants to boot an older, known-insecure, version so that they can continue taking 911 calls or scheduling surgeries... I say let 'em. Whether to exercise this capability should be a decision for each IT department, not imposed by Microsoft on to their whole swarm.

philistine · on July 19, 2024

Microsoft totally lets them. If you use any Enterprise version of Windows, the company can disable updates, but not the user.

__MatrixMan__ · on July 19, 2024

No, after the fact. Where's the prompt at boot-time which asks you if you want to load yesterday's known-good state, or today's recently-updated state?

It's missing because users are not to be trusted with such things, and that's a philosophy with harmful consequences.

bentcorner · on July 19, 2024

Isn't this in the boot options?

https://support.microsoft.com/en-us/windows/advanced-startup...

> Last Known Good Configuration (advanced). Starts Windows with the last registry and driver configuration that worked successfully.

__MatrixMan__ · on July 19, 2024

I don't have any affected systems to test with, but I'd be pretty surprised if that were an effective mechanism for un-breaking the crowdstruck machines. Registry and driver configuration is a rather small part of the picture.

And I don't think that's an accident either. Microsoft is not interested in providing end users with the kind of rollback functionality that you see in Linux (you can just pick which kernel to boot to) because you can get less money by empowering your users and more money by cooperating with people who want to spy on them.

vetinari · on July 19, 2024

1) It is not enterprise version of Windows; it is any version capable of GPO (so Pro applies too, Home doesn't).

2) it is not disabling them; it is approving or rejecting them (or even holding up the decision indefinitely).

You can do that too, via WSUS. It is not reserved to large enterprises, as I've seen claimed several times in this thread. It is available to anyone, who has Windows Server in their network and is willing to install the WSUS role here.

monkmartinez · on July 19, 2024

We took 911 calls all night, I was up listening to the radio all night for my unit to be called. The problem was the dispatching software didn't work so we used paper and pen. Glory Days!!!!

Salgat · on July 19, 2024

Again, this is something the sysadmin can configure. Reread my comment.

__MatrixMan__ · on July 19, 2024

It doesn't really matter to me that it's possible to configure your way out of Microsoft's botnet. They've created a culture of around Windows that is insufficiently concerned with user consent, a consequence of which is that the actions of a dubiously trusted few have impacts that are too far and wide for comfort, impacts which cannot be mitigated by the users.

The power to intrude on our systems and run arbitrary code aggregates in the hands of people that we don't know unless we're clever enough to intervene. That's not something to be celebrated. It's creepy and we should be looking for a better way.

We should be looking for something involving explicit trust which, when revoked at a given timestamp, undoes the actions of the newly-distrusted party following that timestamp, even if that party is Microsoft or cloudstrike or your sysadmin.

Sure, maybe the "sysadmin" is good natured Chuck on the other side of the cube partition: somebody that you can hit with a nerf dart. But maybe they're a hacker on the other side of the planet and they've just locked your whole country out of their autonomous tractors. No way to be sure, so let's just not engage in that model for control in the first place. Lets make things that respect their users.

Salgat · on July 20, 2024

I'm specifically talking about security updates here. Vehicles have the same requirement with forced OTA updates. Remember, every compromised computer is just one more computer spreading malware and being used for DDOS.

cjbgkagh · on July 19, 2024

Ignoring all of the other approaches to that problem I wonder if this update will take the record for most damage done by a single virus/update. At some point the ‘cure’ might be worse than the disease. If it were up to me I would be suggesting different cures.

dtech · on July 19, 2024

I don't see what this has to much do with MS. A bad proprietary kernel module can crash any OS.

SAI_Peregrinus · on July 19, 2024

An immutable OS can be set up to revert to the previous version if a change causes a boot failure. Or even a COW filesystem with snapshots when changes are applied. Hell, Microsoft's own "System Restore" capability could do this, if MS provided default-on support for creating system restore points automatically when system files are changed & restoring after boot failures.

wantsanagent · on July 19, 2024

What's funny to me is that in college we had our computer lab set up such that every computer could be quickly reverted to a good working state just by rebooting. Every boot was from a static known good image, and any changes made while the computer was on were just stored as an overlay on a separate disk. People installed all manner of software that crashed the machines, but they always came back up. To make any lasting changes to the machine you had to have a physical key. So with the right kind of paranoia you can build systems that are resilient to any harmful changes.

zanellato19 · on July 19, 2024

Right, an OS completely crashing like this is the fault of the OS and the problematic code.

An OS should be really resistant to this kind of things.

gtirloni · on July 19, 2024

What other OS besides recent CoreOS/Silverblue/etc does this auto-restore of system files automatically?

falcor84 · on July 19, 2024

No other OS forces an auto-restart.

nonfamous · on July 19, 2024

No restart was needed to cause this crash. As soon as Falcon downloads the updated .sys file ... BOOM.

smsm42 · on July 19, 2024

Well, not the OS, per se, but macos updating mechanisms have auto-restart path, and I imagine any Linux update that touches the kernel can be configured in that way too. It's more the admin's decision then OS's but on all common systems auto-restart is part of the menu too.

ltadeut · on July 19, 2024

MS could've leaned more towards user-space kernel drivers though. Apple has been going in that direction for a while and I haven't seem much of that (if anything) coming from MS.

That would have prevented a bad driver from taking down a device.

sgjohnson · on July 19, 2024

Apple created their own filesystem to make this possible.

The system volume is signed by Apple. If the signature on boot doesn't match, it won't boot.

When the system is booted, it's in read-only mode, no way to write anything to it.

If you bork it, you can simply reinstall macOS in place, without any data/application loss at all.

Of course, if you're a tinkerer, you can disable both, the SIP, and the signature validation, but that cannot be done from user-space. You'll need to boot into recovery mode to achieve that.

I don't think there's anything in NTFS or REFS that would allow for this approach. Especially when you account for the wide variety of setups on which an NTFS partition might sit on. With MBR, you're just SOL instantly.

Apple hardware on the other hand has been EFI (GPT) only for at least 15 years.

nothercastle · on July 19, 2024

Well we all know where Microsoft is in security… even the government acknowledges it’s terrible

philistine · on July 19, 2024

I blame Microsoft in the larger sense; they still allow kernel extensions for use cases that Apple has shown could be moved outside the kernel.

cjbgkagh · on July 19, 2024

I don’t know the specifics of this case, but formal verification of machine code is an option. Sure it’s hard and doesn’t scale well but if it’s required then vendors will learn to make smaller kernel modules.

If something cannot be formally verified at the machine code level there should be a controls level verification where vendors demonstrate they have a process in place to achieving correctness by construction.

Driver devs can be quite sloppy and copy paste bad code from the internet, in the machine code Microsoft can detect specific instances of known copy and pasted code and knows how to patch it. I know they did this for at least one common error. But if I was in the business of delivering an OS I want people to rely on my OS this stuff formal verification at some level would be table stakes.

Analemma_ · on July 19, 2024

I thought Microsoft did use formal verification for kernel-mode drivers and that this was supposed to be impossible. Is it only for their first-party code?

speuleralert · on July 19, 2024

No, I believe 3rd party driver developers must pass Hardware Lab Kit testing for their drivers to be properly signed. This testing includes a suite of Driver Verifier passes that are done, but this is not formal verification in the mathematical sense of the term.

cjbgkagh · on July 19, 2024

I wasn’t privy to the extent it was used, if this was formally verified to be correct and still caused this problem then that really would be something. I’m guessing given the size and scope of an antivirus kernel module that they may have had to make an exception but then didn’t do enough controls checking.

RajT88 · on July 19, 2024

This is almost definitely on Crowdstrike.

There is a windows release preview channel that exists for finding issues like this ahead of time.

To be fair - it is possible the conflicting OS update did not make it to that channel. It is also possible it is due to an embarassing bug from MSFT (uknown as yet).

Until I hear that this is the case - I am pinning this on Crowdstrike. This should have been caught before prod.

cjbgkagh · on July 19, 2024

Even if this is entirely due to Crowdstrike I see it as Microsofts failure to properly police their market.

There is the correctness by testing vs correctness by construction dynamic and in my view given the scale of interactions between an OS and the kernel modules trying to achieve correctness by testing is negligent. Even at the market scale Microsoft has there are not enough Windows computers to preview test every combination. Especially when taking into account the people on the preview ring have different behaviors to those on the mainline so many combinations simply won't appear in the preview.

I see it as Microsoft owning the Windows kernel module space and has allowed sloppiness by third parties and themselves, I don't know the specifics but I could easily believe that this is a due to a bug from Microsoft. The problem with allowing such sloppiness is that the slopy operators out compete the responsible operators, the bad pushes out the good until only the bad remains. A sloppy developer can push more code and gets promoted while the careful developer gets fired.

RajT88 · on July 19, 2024

There's not enough public information about it - but taking this talking point at face value, Microsoft did sign their kernel driver in order for it to be able to do this kind of damage. It's not publicly documented what all validation they do as part of the certification and signing process:

https://learn.microsoft.com/en-us/windows-hardware/drivers/i...

The damage may have been done in a dependency which was not signed by Microsoft. Who knows? Hopefully we'll find out.

In general, a fair amount of the bad behavior of windows devices since Vista has been really about poorly written drivers misbehaving, so there appears to be value in that talking point. All the Vista crashes after release (according to some sources, 30% of all Vista crashes after release were due to NVidia drivers), notably, and more recently if you've ever tried to put your Windows laptop to sleep, and discovered when you take it out of your bag that it had promptly woken back up and cooked itself into having a dead battery. (Drivers not properly supporting sleep mode) WHQL has some things to answer for for sure.

cjbgkagh · on July 19, 2024

Microsoft can prevent this and they should have prevented this, that they did not prevent this in the past does not make it any better.

RajT88 · on July 20, 2024

Crowdstrike has released the detail that the bad files were configuration data.

It is their fault, not Microsoft's. The driver was fine.

Supermancho · on July 19, 2024

> And god I hate their forced updates,

My windows machine notified me of the update, asked me to restart. I was busy, so I didn't. Then the news broke, then the update was rolled back.

vel0city · on July 19, 2024

It wasn't a Windows update. If you got a notification for an update, it wasn't the update that did this.

satisfice · on July 19, 2024

As a tester, I'm frustrated by how little support testing gets in this industry. You can't blame bad testing if it's impossible to get reasonable time and cooperation to do more than a perfunctory job.