KVM bug: Windows VMs can hang at boot after 11 days

Stop - Pixabay[German]Over the past few months, some administrators have complained of Windows virtual machine boot issues in conjunction with the monthly security updates. In many cases, this could be traced back to VMware products such as ESXi – or turning off Secure Boote helped get VMs booting again. But there is a bug in certain versions of the virtualizer KVM, which affects e.g. QUEMU or Proxmox users from version 7.x on. Then Windows virtual machines no longer boot if they have been running for more than 11 days. I'll pull this issue out separately.


Advertising

The Windows boot after patch trauma

Since February, administrators of virtualized Windows Server systems have been trembling before each patchday. The background: The security update KB5022842 published on February 14, 2023 for Windows Server 2022 resulted in virtual machines under various ESXi versions not being able to start after a reboot. Either the system drives were no longer found or the VMs triggered a Secure Boot error when booting. This error has been fixed by an ESXi update (see Windows Server 2022: VMware ESXi 7.0 U3k Patch for Secure Boot Issue (Update KB5022842, Feb. 2023)).

In addition, in the blog post Windows Server 2022 Feb. 2023 Patchday: Secure Boot issues also on bare metal systems! I pointed out cases where the Secure Boot error caused by the security update KB5022842 also occurs. In these cases Secure Boot must remain disabled. If boot problems occur with virtualized) Windows, since this experience Windows updates are often assumed to be the cause. But the cause could also lie elsewhere.

KVM bug prevents boot

In March 2023, I then received a very interesting comment from Joachim in the blog, and he had also pointed it out to me again by e-mail. The starting point was that German blog reader Markus S. mentioned issues with Windows Server 2019 in connection with Patchday and the update installation in this comment. The server froze during the update installation. Joachim asked in this comment about the used hypervison.

Which hypervisor do you have? This is a known problem with KVM for example and has nothing to do with Windows Updates. The bug was quite hard to find because it only affects VMs that have been running for 11 days or more.

In his supplementary email to me, Joachim wrote (thanks for that, had the topic on the radar for a blog post anyway):

Windows VMs running on a hypervisor with a certain version of KVM for more than 11 days can hang at boot time. In almost all cases, it looks to admins like a Windows update is to blame.

Details: We have a Hyper-Converged Hypervisor running on KVM. About 2 months ago there was an upgrade during which some VMs were restarted and three VMs were no longer accessible.

The VMs showed a black screen with the Windows boot animation (circle), but still without the Windows logo.

At first we dismissed it as an anomaly, but then it started to cluster in the weeks after. We suspected everything (Windows updates, antivirus, etc.) but of course the connection with the upgrade of the hypervisor was also in the field of view, even if the support said that it was certainly not the upgrade, because no other customer had the problem.

These are then the situations that no administrator longs for, because the causes can only be found with effort, a lot of testing and research. Joachim came across an exotic bug in KVM during research, which he describes as follows:


Advertising

A month later, through research, I myself found the problem by accident in the Proxmox forum: KVM, starting with a certain version, had a problem with VMs that ran for more than 11 days and then restarted: Windows VMs stuck on boot after Proxmox Upgrade to 7.0 | Page 3 | Proxmox Support Forum (we don't use Proxmox!)

In the Proxmox forum the problem is confirmed by a number of administrators. There is also a GitLab bug entry for Quemu qemu 7.0.0 stuck at Windows boot logo with SeaBios and MBR disk) which describes the following error pattern:

When trying to boot an MBR Windows guest with SeaBios, it is stuck at the blue Windows boot logo, before the loading circle. Changing the vGPU doesn't help, 0% cpu load just frozen. Even if I boot a WinPE iso, the same happens. Even after 30 minutes, the same. Rebooted host multiple times. Since SeaBios is the default in qemu and virt-manager I imagine many VMs are installed as MBR and thus will be stuck.

So also an effect that might be familiar to many administrators when installing updates under Windows. Joachim wrote me about this:

Why is this interesting for your readers? Because almost everyone has blamed the problem on Windows updates and I haven't seen the problem in any blog so far! See e.g. comment of Peter S. here Patchday: Windows 10-Updates (14. März 2023). Because in 99% of the cases the problem is noticed only at the first restart of a VM – and restarts are mostly after Windows updates.

Also, the manufacturer of our hypervisor did not note the problem in the current release notes, even though the problem was solved by the new version. When asked, I was told that the problem was indeed there, but that no customer other than us had opened a ticket. I tried to make it clear that hardly any admin would think that the hypervisor was the problem when a Windows VM hangs on startup after Windows updates. But that fell on deaf ears. So the few admins who read the release notes unfortunately don't know that it wasn't "Windows updates again" that caused the problem. Therefore it would be nice if you would publish this.

At this point my thanks to Joachim for the hints and the comment in the blog. And I have now implemented the plan and mentioned the topic here in the blog. Maybe an administrator will read it who might be affected but hasn't figured out the cause yet.

Similar articles:
Windows Server 2022: February 2023 Patchday and the ESXi VM Secure Boot Issue
Windows Server 2022 Feb. 2023 Patchday: Secure Boot issues also on bare metal systems!
Windows Server 2022: VMware ESXi 7.0 U3k-Patch für Secure Boot-Problem (Update KB5022842, Feb. 2023)


Cookies helps to fund this blog: Cookie settings
Advertising


##1

This entry was posted in Virtualization, Windows and tagged , . Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *