We spent several months working on a very intensive (and very interesting) project that required a writing a driver that was specifically intended to run on a Windows system running under QEMU-KVM hosted on a Linux system (specifically, RHEL 8). I’ll spare you all the gory details about why we had to actually dev and test the driver on a Windows system running in a VM hosted by Linux, but suffice it to say there was absolutely no alternative.
And, really, in the beginning, we didn’t much mind. Installing RHEL on a spare server machine and on that a Windows VM using virt-manager is a piece of cake. And, yes, everything was swell… until we tried to hook up the kernel debugger via a network connection to that Windows VM. Thus began something like two weeks of pain, suffering, and debugging, that lead us to discussions with the Windows debugger people, the Hyper-V people, and the Redhat QEMU developers (Aside: Anybody wanna bet who the most supportive/responsive team was? Anybody??)
Anyhow, we suffered so you don’t have to. Here’s what we found out.
Overview
If you want to enable the Windows kernel debugger using network transport (KDNet) in a guest is hosted on Linux under QEMU-KVM, you need to use specific VM settings.
Either the guest must be fully virtualized (that is, no hypervisor enlightenments enabled), or the hypervisor vendor ID must be set to something other than the default, which is the Windows hypervisor ID (“Microsoft Hv”).
Details of the Problem
When you create a VM on Centos/RHEL, and you specify the OS type as being “Windows”, a standard set of Windows enlightenments will be enabled, resulting in a para-virtualized installation. We know this is the default configuration for Windows VMs created using virt-manager and Cockpit on RHEL 8, and we suspect it is the default for any VMs created through libvirt.
The enlightenments that are enabled results in a Windows guest that runs much faster (and, reportedly, more reliably) than a Windows guest that is fully virtualized.
The enlightenments that are enabled by default include setting the hypervisor ID to the same ID that’s reported by Microsoft Hyper-V (which is “Microsoft Hv”).
The problem occurs when you enable the kernel debugger in the guest, and specify KDNet as the transport, and enlightenments are enabled and the hypervisor ID is set to “Microsoft Hv”. In this configuration, KDNet will not successfully load and the kernel debugger in the guest will not be able to make a connection with WinDbg running on a remote system.
The reason for this is what when the KDNet transport initializes, it checks the hypervisor ID, and if it discovers it is running under Microsoft Hyper-V (that is, the hypervisor ID is “Microsoft Hv”) it attempts to open a debugger connection using an undocumented protocol over a synthetic hypervisor-owned debug device that Hyper-V provides.
Work Arounds for the Problem
There are two ways to avoid this problem and enable kernel debugger connections over KDNet in Windows guests hosted by QEMU.
The first way is to use a fully virtualized installation of Windows. That is, one without any enlightenments enabled. The advantage of this approach is that it’s simple. The easiest way to accomplish this is when you install your Windows guest, specify “generic” as the os-variant. If you’re using one of the libvirt-based tools, when you examine the newly created VM’s XML description (vish dumpxml) you will see no “<hyperv>” section in the configuration.
The second method for avoiding the problem is to create the “normal” para-virtualized Windows installation, but to set the hypervisor vendor ID to something other than the Microsoft Hyper-V ID. The QEMU devs suggested that we use “KVMKVMKVM”, and this has worked fine for us.
The way to implement this second method is (if using one of the libvirt-based configurations) edit the XML configuration (using, for example, virsh edit) to change the vendor ID using the following setting:
<vendor_id state=’on’ value=’KVMKVMKVM’/>
Alternatively, if you’re NOT using a libvirt-based setup, you can add the vendor ID directly to the QEMU command line using the following setting:
hv-vendor-id=KVMKVMKVM
This will enable KDNet to connect successfully.
KDNet With Which Network Configuration?
To simplify matters, we have always dedicated a commodity NIC to the Windows VM to use for KDNet. That is, we removed the typically provisioned networking entirely, and passed-in a NIC via “pass-through” from the host to the guest.
Configuration Examples
For libvirt based installs, the following is the CPU <features> and <clock> sections that we have used successfully with KDNet on a guest.
<features> <acpi/> <apic/> <hyperv> <relaxed state='on'/> <vapic state='on'/> <spinlocks state='on' retries='4096'/> <vpindex state='on'/> <runtime state='on'/> <synic state='on'/> <stimer state='on'> <direct state='on'/> </stimer> <reset state='on'/> <vendor_id state='on' value='KVMKVMKVM'/> <frequencies state='on'/> <reenlightenment state='on'/> <tlbflush state='on'/> <ipi state='on'/> <evmcs state='on'/> </hyperv> <vmport state='off'/> </features> <clock offset='utc'> <timer name='rtc' tickpolicy='catchup'/> <timer name='pit' tickpolicy='delay'/> <timer name='hpet' present='no'/> <timer name='hypervclock' present='yes'/> </clock>
For configurations that run QEMU directly from the command line, we have used the following configuration successfully:
-cpu Skylake-Server-IBRS,ss=on,vmx=on,hypervisor=on,tsc-adjust=on,clflushopt=on,umip=on,pku=on,md-clear=on,stibp=on,arch-capabilities=on,ssbd=on,xsaves=on,ibpb=on,amd-ssbd=on, skip-l1dfl-vmentry=on,pschange-mc-no=on,mpx=off,hv-time,hv-relaxed,hv-vapic,hv-spinlocks=0x1fff,hv-vendor-id=KVMKVMKVM
Longer Term Fixes
We have been in touch with the dev owners of the KDNet interface in Windows, and they agree that it would be useful for them to provide a setting to disable Windows’ use of the synthetic hypervisor-owned debug device. Someday we hope to see this.
Changes to support the undocumented communications interface between Windows and QEMU-KVM have been upstreamed for KVM, but have NOT been upstreamed for QEMU. There IS a dev (not on the Redhat team) who has these changes, and has “promised” to upstream them to QEMU “if he has time.”
We have also communicated with the QEMU devs (who were super responsive, patient, and helpful by the way), and they will be fixing a related issue in QEMU in which the host OS version is mis-reported (in this case, CPUID for 0x40000002 currently returns 0x00001bbc, and it should return 0x00003839).
Other Caveats
There are a few other issues when you put Windows in a QEMU VM of which to be aware, that aren’t directly related to making WinDbg work with KDNet. For example, you’ll want to be sure to select the Q35 chipset in your configuration. AND we were most successful when using UEFI (as opposed to BIOS style) boot.
The End
The above all might not sound like that much, but it was the product of many, many, hours of annoyance and frustration. I hope we can save you some of that time.
[…] Update: OSR published a blog post about Windows debugging in Qemu with a more detailed explanation. […]