VMware ESXi: Hosts crash during VM shutdown with PCI passthrough

[German]A brief note about an issue that may occurs with VMware ESXi virtualization solutions when PCI passthrough is used. Then the host may rashes when the virtual machines (guests) shut down.

VMware ESXi crashes

Users of VMware's ESXi virtualization solutions suffer from sporadic crashes. The problem occurs under various ESXi versions and constellations during virtual machine (guest) shutdown. When PCI passthrough is enabled, the ESXi host crashes with a Purple Screen of Death (similar to the Windows Blue Screen).

VMware ESXi puple screen of death
(Source: administrator.de)

I stumbled upon this issue at German site administrator.de. An affected user has published such a case and posted the above Purple Screen of Death. He wrote in a statement that the problem can occur with older PCI-E devices with ESXi hosts. 

The Purple Screen of Death and the TroubleShooting are described on this website. 

The error has been occurring for years

For many years, the error pattern has been described on the Internet from time to time. Here is a description from the year 2012. I have also found places with Nvidia graphics cards on the Internet. This forum thread contains several pages which deal with the topic Host crashes with Nvidia drivers. A similar thread can be found here.

The affected person at administrator.de is using ESXi 6.7 U2 and observes these host crashes. It doesn't always occur, but always when shutting down the guest VMs, when the PCI devices are pass-through. Quote:

The problem occurs with all tested ESXi constellations, …. According to other reports from the web, VMware ESXi 5.5 is said to have a much lower incidence of the problem, but still be present in some. ….

The used hardware doesn't seem to play a role either, because among other things subsumer boards from MSI, as well as systems from Supermicro were involved.

The affected person states that according to forum contributions the problem is limited to older PCI-E devices. For example, he mentions the AMD Sky500 graphics card he used (identical in construction to AMD S7000). It belongs to the generation of Radeon HD 7700 HD 7800 graphics cards.

No solution, just a workaround

The affected person suspects that the PCI-E bus is not completely reset when the VM is shut down. There does not seem to be an official solution from VMware. A workaround is to disable the relevant PCI-E devices in the host's Device Manager before shutting down the VM. Any of you who are affected by the bug and know a fix?

This entry was posted in Virtualization and tagged , , , . Bookmark the permalink.

4 Responses to VMware ESXi: Hosts crash during VM shutdown with PCI passthrough

  1. Alex says:

    Yep definitely do!

    I have ESXi installed on a proliand DL380e Grn 8 hosting 2 VMs with PCI passthrough:
    – Windows VM with HP FC4 and HP SCSI passthrough devices (backup server)
    – Ubuntu VM with ASMedia eSATA controller with port multiplier support (connect to an external bay with 4x JBOD SATA drives).

    Each time I shutdown the Ubuntu guest, the server restarts (but no purple screen for me, it simply restarts)

  2. Pavel says:

    Updated ESXi 5.5u3 -> ESXi 6.7u3 through complete reinstallation and reconfig.
    ESXi 5.5 had no problem, ESXi 6.7 freezes when I shut down Windows 7 machine with passthrough of PCI-E device. ARC-1220 RAID btw.

    Thinking about reverting back to ESXi 5.5u3, but there I've got nested virtualization issues. I need both nested v and passthrough. Kinda sucks.
    For clarity: I made all that on updated M/B and CPU, so experiment is not really pure.

    I must note that passthrough config looks a bit different in ESXi 6.7: it seems that ESXi 5.5 represented CPU/PCI-E Controller/PCI-E device as 3 layers, but ESXi 6.7 represents it as 2 layers CPU/PCI-E device. Perhaps PCI-E Controller is somehow no longer under proper control.

  3. Marc Weavers says:

    having this problem now with esxi 7.0b, windows 7 VM with pci-e passthrough, nvidia quadro 600, ESXi just hangs, completely unresponsive

  4. Marc Weavers says:

    after 3 successful restarts i think i have managed to fix this issue…

    i added the hardware vendor and device IDs for all graphics cards, and their attached audio devices into the /etc/vmware/passthru.map

    others had also added
    pciPassthru0.msiEnabled = false
    into the VM configs, i had added higher numbers too (pciPassthrough1 to 4, just randomly)

Leave a Reply to Alex Cancel reply

Your email address will not be published. Required fields are marked *

Note: Please note the rules for commenting on the blog (first comments and linked posts end up in moderation, I release them every few hours, I rigorously delete SEO posts/SPAM).