[German]Now that the initial dust has settled after the CrowdStrike incident, which paralyzed 8.5 million Windows systems following a faulty update, new information has emerged. CrowdStrike has presented an initial investigation report into what exactly happened. There are initial figures on the amount of damage and "compensation" has been announced by CrowdStrike. Microsoft points the finger of blame at the EU for not being able to secure everything better. And the BSI is calling on both CrowdStrike and Microsoft to do better.
Advertising
The CrowdStrike incident
On July 19, 2024, 8.5 million Windows systems failed due to a faulty signature update in the CrowdStrike Falcon security software. Most of them remained in a blue screen loop and could no longer be booted in some cases. Corporate computers were affected, as the above-mentioned CrowdStrike Falcon software is not used by private individuals.
As a result, airports came to a standstill, trains, radio stations, petrol stations, stores and banks were affected. Administrators at the affected companies had to try booting their Windows computers up to 15 times and hope that the faulty update would be replaced by a working version via the Internet. Or manually remove the faulty update from the computers on site and make them work again. I also reported on the problems with Bitlocker recovery key queries in a timely manner here in the blog (see article links at the end of the post).
On systems where the CrowdStrike Falcon security software was used, a special filter driver (CSAgent.sys) was integrated into Windows at the lowest operating system level. The driver was certified by Microsoft and was already loaded by Windows during the boot process.
This filter driver then read information from files named 'C-00000291-…32.sys' and interpreted them. Due to incorrect data in this .sys file, the crashes with BlueScreens were caused by CSAgent.sy. I had presented some hints with initial analyses in the blog post CrowdStrike analysis: Why an empty file led to BlueSceen.
The CrowdStrike Preliminary Post Incident Review (PIR)
In the meantime, CrowdStrike has published the Preliminary Post Incident Review (PIR): Content Configuration Update Impacting the Falcon Sensor and the Windows Operating System (BSOD) as of July 24, 2024, explaining what happened. The Falcon platform performs a regular update as part of its dynamic protection mechanisms. CrowdStrike delivers security content configuration updates for the Falcon sensors in two ways::
Advertising
- Sensor-Content, der direkt mit dem Sensor ausgeliefert wird,
- und Rapid Response Content, der darauf ausgelegt ist, auf die sich verändernde Bedrohungslandschaft schnell zu reagieren.
The sensor content offers a wide range of functions to respond to threats. These are always part of a sensor release and are not dynamically updated via the cloud. The sensor content includes AI and machine learning models on the sensor and consists of code written explicitly to provide longer-term, reusable threat detection capabilities to CrowdStrike developers. The July 19, 2024 issue involved a rapid response content update with an undetected bug. Here are the details:
- On Friday, July 19, 2024 at 04:09 UTC, CrowdStrike released an update for a content configuration via a 'C-00000291-…32.sys' file. Update for the Windows sensor to collect telemetry data on possible new threat methods.
- This Rapid Response Content configuration update caused the Windows system to crash when running sensor version 7.11 and above. Mac and Linux hosts were not affected.
- The content update error was reversed on Friday, July 19, 2024 at 05:27 UTC. Systems that went online after this time or that did not connect during the time window were not affected.
All sensor content updates go through a comprehensive quality assurance process that includes automated and manual testing, validation and rollout steps. However, the incident on July 19, 2024 was not due to an error in the sensor content, but occurred in the rapid response content.
The Rapid Response Content is used to perform a large number of behavioral pattern comparisons on the sensor. For this purpose, values with associated filtering are stored in fields as configuration data in a proprietary binary file. CrowdStrike works with templates that specify what the sensor should observe, recognize or prevent.
Rapid response content is delivered to the Falcon sensor from the cloud in the form of content configuration updates via so-called channel files. A content interpreter then reads the channel files and attempts to implement the monitoring specifications. CrowdStrike writes that the is designed in such a way that the content interpreter can reliably handle exceptions to potentially problematic content.
Newly published template types are stress tested for many aspects such as resource utilization, impact on system performance and event volume. The files also go through a content validator, which checks the content for validity before publication.
On July 19, 2024, two additional IPC template instances were provided. Due to an error in the content validator, one of the two template instances passed validation despite containing problematic content data. According to CrowdStrike, these instances were deployed in production due to the tests performed prior to the first deployment of the template type (on March 5, 2024), confidence in the checks performed by the Content Validator and previous successful deployments of IPC template instances.
As soon as the problematic content was received by the sensor and loaded into the content interpreter, the data in channel file 291 led to an out-of-bounds memory read. This triggered an exception error, which then led to the Windows operating system crashing with a blue screen (BSOD). The details can be found in the PIR linked above.
Security researcher Kevin Beaumont has published some tweets on X and this article on DoublePulsar with what he has learned from the incident. US Homeland Security also subpoenaed CrowdStrike and wanted to know what had happened.
Initial damage assessment
In previous blog posts, I had speculated that the incident may have resulted in billions in damage. In this article, the Guardian gives an initial figure for the amount of damage. The worldwide outage of Windows systems caused by CrowdStrike cost US Fortune 500 companies alone 5.4 billion US dollars. The insurer Parametrix writes that banking and healthcare companies as well as large airlines are likely to suffer the greatest losses.
The projected losses do not include the losses incurred by Microsoft, whose systems suffered major outages in the crash. And the above amount only relates to US companies – but there were worldwide failures. The insurer Parametrix estimates that the total insured losses for the US Fortune 500 companies that are not part of Microsoft could be between 540 million and 1.08 billion dollars.
Later I read within this neowin.net article that the global damage caused by the CrowdStrike incident is estimated at 15 billion US dollars.
What about compensation?
Calls for compensation were made quite quickly after the incident. The crucial point for many CrowdStrike customers is probably simply the following passage from their Terms and Conditions, which regulates the use of the Falcon software (Andreas drew my attention to this by email, thank you for that). Point 8.6 contains the following disclaimer:
8.5 No Guarantee. CUSTOMER ACKNOWLEDGES, UNDERSTANDS, AND AGREES THAT CROWDSTRIKE DOES NOT GUARANTEE OR WARRANT THAT IT WILL FIND, LOCATE, OR DISCOVER ALL OF CUSTOMER'S OR ITS AFFILIATES' SYSTEM THREATS, VULNERABILITIES, MALWARE, AND MALICIOUS SOFTWARE, AND CUSTOMER AND ITS AFFILIATES WILL NOT HOLD CROWDSTRIKE RESPONSIBLE THEREFOR.
8.6 Disclaimer. EXCEPT FOR THE EXPRESS WARRANTIES IN THIS SECTION 8, CROWDSTRIKE AND ITS AFFILIATES DISCLAIM ALL OTHER WARRANTIES, WHETHER EXPRESS, IMPLIED, STATUTORY OR OTHERWISE. TO THE MAXIMUM EXTENT PERMITTED UNDER APPLICABLE LAW, CROWDSTRIKE AND ITS AFFILIATES AND SUPPLIERS SPECIFICALLY DISCLAIM ALL IMPLIED WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, TITLE, AND NON-INFRINGEMENT WITH RESPECT TO THE OFFERINGS AND CROWDSTRIKE TOOLS. THERE IS NO WARRANTY THAT THE OFFERINGS OR CROWDSTRIKE TOOLS WILL BE ERROR FREE, OR THAT THEY WILL OPERATE WITHOUT INTERRUPTION OR WILL FULFILL ANY OF CUSTOMER'S PARTICULAR PURPOSES OR NEEDS. THE OFFERINGS AND CROWDSTRIKE TOOLS ARE NOT FAULT-TOLERANT AND ARE NOT DESIGNED OR INTENDED FOR USE IN ANY HAZARDOUS ENVIRONMENT REQUIRING FAIL-SAFE PERFORMANCE OR OPERATION. NEITHER THE OFFERINGS NOR CROWDSTRIKE TOOLS ARE FOR USE IN THE OPERATION OF AIRCRAFT NAVIGATION, NUCLEAR FACILITIES, COMMUNICATION SYSTEMS, WEAPONS SYSTEMS, DIRECT OR INDIRECT LIFE-SUPPORT SYSTEMS, AIR TRAFFIC CONTROL, OR ANY APPLICATION OR INSTALLATION WHERE FAILURE COULD RESULT IN DEATH, SEVERE PHYSICAL INJURY, OR PROPERTY DAMAGE. Customer agrees that it is Customer's responsibility to ensure safe use of an Offering and the CrowdStrike Tools in such applications and installations. CROWDSTRIKE DOES NOT WARRANT ANY THIRD PARTY PRODUCTS OR SERVICES.
This can be stretched – if an airport breaks down because passengers cannot be processed, this has nothing to do with aircraft navigation. However, these are applications or systems whose failure can lead to material damage. The same applies to the hospital mentioned above and its claims for damages. Lawyers and courts will certainly have to deal with this – and there is also the question of why the Falcon software could be used at all." It remains exciting.
A lot of bad news now, but I still have one piece of good news: CrowdStrike has sent its partners the letter shown in the tweet above. The partners will receive an UberEats credit worth 10 US dollars for a coffee or snack for the trouble caused. Techcrunch reported on the issue here.
Microsoft blames the EU
In statements to the press, Microsoft refers to an agreement from 2009 in which it agreed with the EU Commission to open up the internals of the system to third-party providers. This implies that the EU is ultimately to blame for the whole disaster.
In the above tweet, security expert Prof. Dennis Kipker refers to a press release from 2009 in which Microsoft presents the whole thing in a somewhat more "positive" light. He considers Microsoft's deliberate accusation that the EU is partly to blame for the global CrowdStrike debacle to be nothing more than political tactics.
I pointed out in one of my subsequent articles that the implementation is definitely important. Ryan Ries, Microsoft Windows Escalation Engineer, says on X:
My opinion is WHQL should no longer sign kernel modules that have the capability of downloading data from the internet that is not also signed by WHQL.
This touches on the topic of "what do I as Microsoft allow in WHQL" and sign that too? Here I refer to this German article from heise on why this disaster cannot happen with macOS. Apple has now abolished kernel extensions (kext) and only allows security software to run in user space. There, such a bug cannot drag the entire operating system into the abyss. I think this topic will keep us busy for a few more days.
Similar articles:
Worldwide outage of Microsoft 365 (July 19, 2024)
Windows systems throw BSOD due to faulty CrowdStrike update
Why numerous IT systems around the world failed due to two errors on July 19, 2024
CrowdStrike analysis: Why an empty file led to BlueSceen
Review of the CrowdStrike incident, the biggest computer glitch of all time
CrowdStrike incident: sensor failure as a previously unknown side effect?
CrowdStrike: Investigation report; amount of damages and compensation; attribution of blame
Advertising