Well, this one took us by surprise…
MmBuildMdlForNonPagedPool is the standard shortcut function drivers use to build MDLs describing non-pageable memory. Despite the name, the buffer described by the MDL does not necessarily need to come from non-paged pool. For example, the documentation says it’s legal to call MmBuildMdlForNonPagedPool on the buffer returned by MmAllocateContiguousMemorySpecifyCache, which is definitely NOT a non-paged pool address.
This description led me to believe that the API could be used for ANYthing that’s non-pageable. For example, it always seemed reasonable to use this API for MDLs describing kernel stack addresses. The only way for the stack to become pageable is by calling KeWaitForSingleObject with a WaitMode of UserMode, so as long as the driver doesn’t do that it should be all good.
Imagine my surprise then when we hit this Driver Verifier bugcheck while testing a driver under RS4:
DRIVER_VERIFIER_DETECTED_VIOLATION (c4)
A device driver attempting to corrupt the system has been caught. This is
because the driver was specified in the registry as being suspect (by the
administrator) and the kernel has enabled substantial checking of this driver.
If the driver attempts to corrupt the system, bugchecks 0xC4, 0xC1 and 0xA will
be among the most commonly seen crashes.
Arguments:
Arg1: 00000140, Non-locked MDL constructed from either pageable or tradable memory.
Arg2: 00000000, Current IRQL level.
Arg3: c579cfe0, MDL address
Arg4: 8889f000, Associated virtual address with this MDL
...
04 nt!KeBugCheckEx
05 nt!VerifierBugCheckIfAppropriate
06 nt!VerifierMmBuildMdlForNonPagedPool
07 Confused!ConFsdUtilScsiSendSrbInternal
08 Confused!ConFsdkUtilScsiSendSrb
09 Confused!ConFsdUtilQueryDiskInformation
The buffer in question is an active stack address and thus should NOT be pageable at this time. I assumed this had to be a Driver Verifier problem, but decided to check the documentation for MmBuildMdlForNonPagedPool and make sure that I hadn’t missed something. Much to my surprise, there is now a paragraph to explicitly forbid calling MmBuildMdlForNonPagedPool with kernel stack addresses:
MmBuildMdlForNonPagedPool may not be used with MDLs describing buffers allocated on a kernel stack. To build an MDL describing a kernel stack buffer, drivers must call MmProbeAndLockPages. This rule applies even if the driver guarantees that the kernel stack cannot be paged out.
That paragraph is absolutely maddening. Suddenly my driver fails Driver Verifier and there’s a new paragraph to the docs to tell me that I’m doing something evil, but neither of them give me a single reason why.
At this moment I felt personally attacked and whined for as long as anyone at the office would listen, before begrudgingly changing my code (Driver Verifier always wins…).
Of course, even after changing the code to use MmProbeAndLockPages, the fact that I didn’t know why I had to change my code was driving me nuts. This led me back to the bugcheck description and a part that I missed my first time through (emphasis mine):
Non-locked MDL constructed from either pageable or tradable (sic) memory.
Aha! I was so focused on whether or not the buffer was pageable that I hadn’t noticed the reference to tradeable.
In the interest of satisfying physically contiguous memory allocations, the Memory Manager might move (“trade”) an active buffer from one physical page to another. This effectively allows the Memory Manager to defragment free regions of physical memory and thus make it more likely to satisfy physically contiguous allocations. The defragmentation is done by copying existing pages to a new locations in memory, then updating the virtual addresses using the pages to point to the new locations.
Certain buffers, such as non-paged pool, are exempt from this trading. Also, any buffers locked using MmProbeAndLockPages are exempt from this trading. However, kernel stacks are not exempt from this trading (see MiSwapStackPage)! Thus it is, in fact, VERY wrong to use MmBuildMdlForNonPagedPool with a kernel stack address as the underlying physical page could move even while the stack is resident in memory.
In researching this more, we tracked this behavior back to sometime in the Windows Vista release cycle. So, not only is this operation invalid, it’s been invalid for a REALLY long time. Thankfully, even though this is something that I’ve assumed for quite a while, this is the first driver where we’ve actually implemented it.
I suggest that you all search your code bases for references to MmBuildMdlForNonPagedPool and make absolutely sure that you never call it with a kernel stack buffer.
Driver Verifier FTW
We have a saying around here at OSR: If it hasn’t been tested with Driver Verifier running, then it hasn’t been tested! While this statement holds true, this bug is a great example of why it’s always important to test with the latest and greatest version of Driver Verifier. Even if your customers don’t care about RS4, running your driver under the RS4 version of Driver Verifier is only a good thing.