[Rod
Serling voice narration:] "Imagine,
if you will, a very large cybersecurity company rolled out software
that
made millions of systems around the world crash and fail, wreaking
havoc on millions of people."
It’s not The Twilight Zone. It’s the real-life CrowdStrike Perfect Storm. The cybersecurity company CrowdStrike recently introduced an update with a significant problem: it included a null pointer reference for an update file automatically sent out to and installed in both physical and virtual PCs of its subscribers. This null file rendered the drivers calling on it inoperable, and CrowdStrike's Falcon security software package to crash unless the null pointer was removed. That problem in turn prevented impacted systems from booting up. The invalid file was included in what the company calls channel files, containing definitions of viruses to block. As mentioned, the channel files are linked to and called by the program's drivers. Drivers interact with various hardware levels to ensure a device works with an operating system. The CrowdStrike drivers, which relied on the flawed data file, had to be on the system to start up. If Falcon couldn't run, the system couldn't boot up. This situation highlights the need for cybersecurity software to have high-level access to the operating system to protect against malicious code. It also highlights the critical importance of issuing security patches that do not keep systems from booting. While updates for antivirus software must be rolled out quickly to combat new malware, not thoroughly testing them can lead to outages and damage similar to what viruses themselves cause, ironically. CrowdStrike's reliance on virtual validators for testing was insufficient, as CrowdStrike would have caught the issue if they had tested on actual Windows devices. The aftermath was substantial, with millions of customer systems affected. The situation can be summed up as res ipsa loquitur, meaning the evidence speaks for itself, and a potential prima facie case of negligence; had adequate testing been performed, the systems would not have crashed and been prevented from booting -- since the systems crashed and could not boot, it estables the inadequacy of testing. Although contracts may limit liability, they don't provide complete immunity from damage claims by first or third parties, nor from shareholder actions due to reduced company value. Fixing the problem was cumbersome and required manual intervention. Users had to reboot multiple times, hoping the software would download and replace the bad file before it caused a crash. Other solutions included booting into Safe Mode, entering Bitlocker encryption keys, and/or using a USB stick with a special script from Microsoft. The combination of these factors created a "Perfect Storm" of cascading challenges, emphasizing the importance of thorough testing and the real-life consequences of neglecting it. https://andrewtetzeli.substack.com/p/how-the-crowdstrike-perfect-storm ![]() |