[German]Today a blog post about a general topic. The question is: what is a race condition? Some blog readers may have noticed this term while reading update issues.
Advertising
For example, within my blog article Windows Server 2012 R2 / 2016: High CPU load with Update KB4345418 / KB4054566 a race condition is mentioned:
Addresses an issue that may cause some devices running network monitoring workloads to receive the 0xD1 Stop error because of a race condition after installing the July update.
Also the blog post Windows 10 V1607-V1709: Updates from June 21, 2018 mentions a race condition:
Addresses an issue where Windows Defender Security Center and the Firewall Pillar app stop working when opened. This is caused by a race condition that occurs if third-party antivirus software has been installed.
The blog post Windows 7: Preview Rollup Update KB4088881 (03/23/2018) mentions a race condition during updating.
Addresses issue with a race condition in the Universal C Runtime (CRT) that occurs when you update the global locale. The issue corrupts the current locale reference count and triggers a double free condition.
Seems to be not so rare and can occur in all Windows versions, but also under macOS or Linux.
Race Condition, some explantions
Wikipedia knows more about this: A race condition or race hazard is the behavior of an electronics, software, or other system where the output is dependent on the sequence or timing of other uncontrollable events. It becomes a bug when events do not happen in the order the programmer intended.
Advertising
Race conditions can occur especially in multithreaded or distributed software programs. Unintentional race condition are a common cause of hard-to-find bugs; characteristic of such situations is that even the changed conditions for program testing, such as additional logging or debug mode, can lead to a complete disappearance of symptoms.
Microsoft has released kb article 317723 (Description of race conditions and deadlocks) about that topic – but with focus on Visual Basic 2003 and VB .NET 2003. This article deals with accessing a shared variable through two competing threads. The race condition leads to unpredictable results.
An example from March 2018
Susan Bradley adressed this topic on askwoody.com in March 2018, at that time due to current events (KB4075150 install ended on many system with non-booting machines, due to race condition). Microsoft's explanation of the problem as:
This issue occurs in the unlikely event, due to a race condition, that the Windows Update servicing stack incorrectly skips installing the newer version of some critical drivers in the cumulative update and uninstalls the currently active drivers during maintenance.
A driver was successfully uninstalled there (the transaction was successful). But the new driver was not installed due to a race condition (MS remains unspezific, we have to call it a bug). Because acip.sys was a critical boot driver, the machines could not start. Susan Bradley discusses the details in the article at askwoody.com linked above and reports that in dism uninstall attempts only one'shot' was free to trigger the driver rollback.
Advertising
Gunter,
I am not disputing anything you have said!
What I am disputing is one part of Microsoft flipping off this term "Race Condition" to cover up their fundamental screw ups in Packaging, Delivery and Distribution on WU/MU. WSUS/SCCM is a different kettle of fish and as we have seen over and over again on patchmanagement.org are even worse at handling the Packages, Delivery and Distribution in some ways, than are presented then WU/MU. This could be because WU/MU tend to be in in a more less perpetual STATE of update/upgrade process for their actual software then WSUS/SCCM. This is mainly do to WIP's new and edgy requirements through those Development/Proceses. One can really see the results of that, mainly in W 7, being much less flexible/resilient to these undulation.
When ever I see the term "Race Condition" flipped out into the ether for an explanation, I watch very carefully now. The final end result has always(so for) boiled down to the very same thing. One KB package internal to a CU or external to a CU was LATE in being processed and installed. More often than not, it involves the SSU. That is why it is so critical that SSU's be installed before any other Patches/Packages in a Patch cycle.
Susan Bradley and I saw this very clearly in the spring when this term "Race Condition" was being flaunted around as a cause of BSoD conditions that were happening sporadically on some PC's generally and in fleets of upto 1000's. After a number of weeks, it finally came out that the SSU was not getting installed CONSISTENTLY on ALL PC's before every other KB/Package in that current Patch Cycle!!!
THAT IS NOT A RACE CONDITION!!! That is bad Delivery!!!
As you pointed out a Race Condition happens closer to the CPU between Threads and during calls, etc. In other words in development right before the Coders eyes. This will be more consistent than not. What we saw was to sporadic to be that, and to small of a subset, it had to be a Packaging/Delivery process issue.
In closing, actual Race Conditions can happen but to get beyond the Internal evaluation process, "CANARY", is much less likely then an SSU or other Package/KB getting installed at an inappropriate time/order. I would image, the softies in Canary, see ALLOT of Race Conditions, therefore it is easy to banter the term about while they are scrambling to figure out what the hell is going on to/with the PAINED few.
Thx for your additional explanations and insights.