VMware ESXi: Hosts crash during VM shutdown with PCI passthrough

[German]A brief note about an issue that may occurs with VMware ESXi virtualization solutions when PCI passthrough is used. Then the host may rashes when the virtual machines (guests) shut down.


Advertising

VMware ESXi crashes

Users of VMware’s ESXi virtualization solutions suffer from sporadic crashes. The problem occurs under various ESXi versions and constellations during virtual machine (guest) shutdown. When PCI passthrough is enabled, the ESXi host crashes with a Purple Screen of Death (similar to the Windows Blue Screen).

VMware ESXi puple screen of death
(Source: administrator.de)

I stumbled upon this issue at German site administrator.de. An affected user has published such a case and posted the above Purple Screen of Death. He wrote in a statement that the problem can occur with older PCI-E devices with ESXi hosts. 

The Purple Screen of Death and the TroubleShooting are described on this website

The error has been occurring for years

For many years, the error pattern has been described on the Internet from time to time. Here is a description from the year 2012. I have also found places with Nvidia graphics cards on the Internet. This forum thread contains several pages which deal with the topic Host crashes with Nvidia drivers. A similar thread can be found here.

The affected person at administrator.de is using ESXi 6.7 U2 and observes these host crashes. It doesn’t always occur, but always when shutting down the guest VMs, when the PCI devices are pass-through. Quote:


Advertising

The problem occurs with all tested ESXi constellations, …. According to other reports from the web, VMware ESXi 5.5 is said to have a much lower incidence of the problem, but still be present in some. ….

The used hardware doesn’t seem to play a role either, because among other things subsumer boards from MSI, as well as systems from Supermicro were involved.

The affected person states that according to forum contributions the problem is limited to older PCI-E devices. For example, he mentions the AMD Sky500 graphics card he used (identical in construction to AMD S7000). It belongs to the generation of Radeon HD 7700 HD 7800 graphics cards.

No solution, just a workaround

The affected person suspects that the PCI-E bus is not completely reset when the VM is shut down. There does not seem to be an official solution from VMware. A workaround is to disable the relevant PCI-E devices in the host’s Device Manager before shutting down the VM. Any of you who are affected by the bug and know a fix?


Advertising
This entry was posted in Virtualization and tagged , , , . Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *