Last reviewed and updated: 10 August 2020
Windows 10 version 1809 (build 17763), otherwise known as RS5, introduced a new pool allocator in the kernel for the first time in, well, forever. Interestingly, the newly introduced kernel pool allocator is actually the existing user mode Low Fragmentation Heap (LFH) allocator. While there’s something to be said for the, “if it ain’t broke” mentality, having a single allocator in the O/S certainly makes a certain amount of sense from a maintainability perspective. Also, the user mode heap allocator has undergone significant revisions over the years to better reflect modern security practices, so it makes sense to share those benefits with kernel mode as well.
Most of us wouldn’t even notice or care that there’s a new pool allocator (except for the fact that it broke !pool, that is, ahem). However, over the years I have debugged so many BAD_POOL_HEADER bugchecks that I was curious about how the new pool allocator responded to some obvious driver bugs. Specifically, I wondered about the following cases:
- Buffer Overruns
- Double Frees
- Use After Frees
So, I seized the unique opportunity to intentionally write buggy code (and, yes, I did at one point end up with a bug in my buggy code that caused it to not be buggy). The buggy code provides IOCTLs to generate each buggy scenario and the code to handle the IOCTLs is shown in Figure 1.
size = 268; allocation = ExAllocatePoolWithTag(NonPagedPool, size, 'KLSO'); DbgPrint("!pool 0x%p (size - 0x%x)\n", allocation, size); switch (IoControlCode) { case IOCTL_OSRLK_OVERRUN: { DbgPrint("Zeroing 0x%x\n", size * 2); RtlZeroMemory(allocation, size * 2); ExFreePool(allocation); break; } case IOCTL_OSRLK_DOUBLE_FREE: { DbgPrint("Freeing twice\n"); ExFreePool(allocation); ExFreePool(allocation); break; } case IOCTL_OSRLK_USE_AFTER_FREE: { DbgPrint("Freeing then zeroing\n"); ExFreePool(allocation); RtlZeroMemory(allocation, size); break; }
I then ran each test to pit the Windows 7 allocator against the Windows 10 19H1 allocator to see which one performed better in detecting the bugs. Note that this was not a rigorous, scientific study involving thousands of iterations. Each one was run about three times max to validate that the behavior was at least somewhat repeatable.
Now, without further ado, the results…
Overrun Challenge
Windows 7
On Windows 7 the system immediately crashed with a BAD_POOL_HEADER:
BAD_POOL_HEADER (19) The pool is already corrupt at the time of the current request. This may or may not be due to the caller. The internal pool links must be walked to figure out a possible cause of the problem, and then special pool applied to the suspect tags or the driver verifier to a suspect driver. Arguments: Arg1: 0000000000000020, a pool block header size is corrupt. Arg2: fffffa801a26bde0, The pool entry we were looking for within the page. Arg3: fffffa801a26bf00, The next pool entry. Arg4: 000000000412003a, (reserved)
Running !pool on second argument to the bugcheck walks the pool page and shows us where we went off a cliff:
1: kd> !pool @$bug_param2 Pool page fffffa801a26bde0 region is Nonpaged pool fffffa801a26b000 size: 5c0 previous size: 0 (Allocated) Txrn fffffa801a26b5c0 size: 1b0 previous size: 5c0 (Free) Free fffffa801a26b770 size: c0 previous size: 1b0 (Allocated) FMsl fffffa801a26b830 size: 150 previous size: c0 (Allocated) File (Protected) fffffa801a26b980 size: c0 previous size: 150 (Allocated) FMsl fffffa801a26ba40 size: 3a0 previous size: c0 (Free) FMic *fffffa801a26bde0 size: 120 previous size: 3a0 (Free ) *OSLK Owning component : Unknown (update pooltag.txt) fffffa801a26bf00 doesn't look like a valid small pool allocation, checking to see if the entire page is actually part of a large page allocation...
Windows 10 19H1
Running the same test on Windows 10 produced no crash. Running !pool on the freed buffer shows a corruption of the page just like on Windows 7:
1: kd> !pool 0xFFFFCD02FB902050 Pool page ffffcd02fb902050 region is Nonpaged pool ffffcd02fb902000 size: 30 previous size: 0 (Free) .... ffffcd02fb902040 doesn't look like a valid small pool allocation, checking to see if the entire page is actually part of a large page allocation...
But no BAD_POOL_HEADER crash.
Interestingly, I ran the test again and this time I did hit a crash. However, it was an IRQL_NOT_LESS_THAN_OR_EQUAL bugcheck in the bowels of the heap allocator on the next allocation:
IRQL_NOT_LESS_OR_EQUAL (a) An attempt was made to access a pageable (or completely invalid) address at an interrupt request level (IRQL) that is too high. This is usually caused by drivers using improper addresses. If a kernel debugger is available get the stack backtrace. Arguments: Arg1: ffffcd02fb8b5022, memory referenced Arg2: 0000000000000002, IRQL Arg3: 0000000000000000, bitfield : bit 0 : value 0 = read operation, 1 = write operation bit 3 : value 0 = not an execute operation, 1 = execute operation (only on chips which support this level of status) Arg4: fffff80574452c49, address which referenced memory 0: kd> kc # Call Site 00 nt!DbgBreakPointWithStatus 01 nt!KiBugCheckDebugBreak 02 nt!KeBugCheck2 03 nt!KeBugCheckEx 04 nt!KiBugCheckDispatch 05 nt!KiPageFault 06 nt!RtlpHpVsContextAllocateInternal 07 nt!ExAllocateHeapPool 08 nt!ExAllocatePoolWithTag 09 OSRLK!OSRLKEvtIoDeviceControl
Overrun Challenge Winner: Windows 7. Corruption was detected immediately when the buffer was freed and we were provided a clear bugcheck description.
Double Free Challenge
Windows 7
On Windows 7 the system immediately crashed with a BAD_POOL_CALLER:
BAD_POOL_CALLER (c2) The current thread is making a bad pool request. Typically this is at a bad IRQL level or double freeing the same allocation, etc. Arguments: Arg1: 0000000000000007, Attempt to free pool which was already freed Arg2: 0000000000001097, Pool tag value from the pool header Arg3: 0000000004120009, Contents of the first 4 bytes of the pool header Arg4: fffffa801b3de9f0, Address of the block of pool being deallocated
Windows 10 19H1
Running the same test on Windows 10 produced no crash. Much like last time, running the test a second time did indeed result in a system crash, though this time it was properly at the point of the second free:
KERNEL_MODE_HEAP_CORRUPTION (13a) The kernel mode heap manager has detected corruption in a heap. Arguments: Arg1: 0000000000000011, Type of corruption detected Arg2: ffff91030de00100, Address of the heap that reported the corruption Arg3: ffff91030dd133a0, Address at which the corruption was detected Arg4: 0000000000000000
Curious about what Arg1 == 0x11 meant, I took a SWAG and grep’d the type information for something related to “heap” and “type”:
0: kd> dt nt!_heap*type* ntkrnlmp!_HEAP_SEG_RANGE_TYPE ntkrnlmp!_HEAP_FAILURE_TYPE
That was lucky! Dumping HEAP_FAILURE_TYPE we see that 0x11 (0n17) maps to heap_failure_segment_lfh_double_free:
0: kd> dt nt!_HEAP_FAILURE_TYPE heap_failure_internal = 0n0 heap_failure_unknown = 0n1 heap_failure_generic = 0n2 heap_failure_entry_corruption = 0n3 heap_failure_multiple_entries_corruption = 0n4 heap_failure_virtual_block_corruption = 0n5 heap_failure_buffer_overrun = 0n6 heap_failure_buffer_underrun = 0n7 heap_failure_block_not_busy = 0n8 heap_failure_invalid_argument = 0n9 heap_failure_invalid_allocation_type = 0n10 heap_failure_usage_after_free = 0n11 heap_failure_cross_heap_operation = 0n12 heap_failure_freelists_corruption = 0n13 heap_failure_listentry_corruption = 0n14 heap_failure_lfh_bitmap_mismatch = 0n15 heap_failure_segment_lfh_bitmap_corruption = 0n16 heap_failure_segment_lfh_double_free = 0n17 heap_failure_vs_subsegment_corruption = 0n18 heap_failure_null_heap = 0n19 heap_failure_allocation_limit = 0n20 heap_failure_commit_limit = 0n21
So there is double free detection, but for some reason it didn’t trigger on the first pass.
Double Free Challenge Winner: Very close, but Windows 7 because it was caught the first time every time we tested. Windows 10 also lost points because the debugger didn’t provide additional reason information and we only came to it through a lucky guess.
Use After Free Challenge
Windows 7
Running this test on Windows 7 produced no crash.
Windows 10 19H1
Running this test on Windows 10 produced no crash.
Use After Free Challenge Winner: Tie. To be fair, it would take a lot of extra processing in the allocator to find this bug, so not surprising that the bug was not caught by either allocator.
Overall Results
Though there are surely benefits to the new allocator, in our opinion the old allocator wins in its ability to detection corruptions of the pool in cases tested.
What About With Driver Verifier?
Of course, the best way to find pool corruptions is with Driver Verifier and Special Pool. I’m happy to report that both allocators caught all three bugs equally, so no loss in functionality there.