One of the situations in which post-mortem debugging is useful is for identifying race conditions – timing windows, in which the state of the machine changes between two points in the code execution, so that decisions made later in the code are incorrect because the state of the system has changed in some fundamental way.
In today’s case, we have what looks like a potential candidate for this, although we can’t really prove it – but then again, in my experience we never really do “prove” race conditions – we detect them, theorize them, change the code to eliminate them and hope they never come back.
Then again, I’ve seen a printf in code make the bug go away as well, so the frustrating part of these cases is that the solution is never really definitive.
In today’s case, we have a fairly mundane Windows 7 box. I recently added 4GB of memory to this Windows 7 system, and shortly thereafter it crashed (bug check 0x3B.) Naturally, having this happen right after a memory upgrade did make me slightly suspicious so I took the opportunity to explore this particular crash in a bit more detail.
One nice change (since Windows Vista) has been that the machines are configured by default to create kernel summary dumps, so that there is more that we can do than simply a superficial analysis of the problem.
Naturally, I started with the usual !analyze –v and looked at its results:
2: kd> !analyze -v ******************************************************************************* * * * Bugcheck Analysis * * * ******************************************************************************* SYSTEM_SERVICE_EXCEPTION (3b) An exception happened while executing a system service routine. Arguments: Arg1: 00000000c0000005, Exception code that caused the bugcheck Arg2: fffff8800106f141, Address of the exception record for the exception that caused the bugcheck Arg3: fffff88005f45960, Address of the context record for the exception that caused the bugcheck Arg4: 0000000000000000, zero. Debugging Details: ------------------ PEB is paged out (Peb.Ldr = 00000000`7efdf018). Type ".hh dbgerr001" for details PEB is paged out (Peb.Ldr = 00000000`7efdf018). Type ".hh dbgerr001" for details EXCEPTION_CODE: (NTSTATUS) 0xc0000005 - The instruction at 0x%08lx referenced memory at 0x%08lx. The memory could not be %s. FAULTING_IP: fltmgr!TreeUnlinkMulti+51 fffff880`0106f141 488b4620 mov rax,qword ptr [rsi+20h] CONTEXT: fffff88005f45960 -- (.cxr 0xfffff88005f45960) rax=fffffaf7072f96b0 rbx=0000000000000000 rcx=fffffa8008e13318 rdx=fffffa8007e5b550 rsi=0000009d00000000 rdi=0000000000000000 rip=fffff8800106f141 rsp=fffff88005f46330 rbp=fffffa8008e13318 r8=ffffffffffffffff r9=ffffffffffffffff r10=fffffffffffffe4a r11=0000000000000001 r12=fffffa8007e5b550 r13=fffffa8007e34684 r14=0000000000004000 r15=0000000000000000 iopl=0 nv up ei pl nz na po nc cs=0010 ss=0018 ds=002b es=002b fs=0053 gs=002b efl=00010206 fltmgr!TreeUnlinkMulti+0x51: fffff880`0106f141 488b4620 mov rax,qword ptr [rsi+20h] ds:002b:0000009d`00000020=???????????????? Resetting default scope DEFAULT_BUCKET_ID: VISTA_DRIVER_FAULT BUGCHECK_STR: 0x3B PROCESS_NAME: CarboniteServi CURRENT_IRQL: 0 LAST_CONTROL_TRANSFER: from fffff8800106c460 to fffff8800106f141 STACK_TEXT: fffff880`05f46330 fffff880`0106c460 : fffffa80`07a96920 fffffa80`07e5b550 fffffa80`07a96920 00000000`00000000 : fltmgr!TreeUnlinkMulti+0x51 fffff880`05f46380 fffff880`0106cbe9 : fffff880`05f48000 00000000`00000002 00000000`00000000 00000000`00000000 : fltmgr!FltpPerformPreCallbacks+0x730 fffff880`05f46480 fffff880`0106b6c7 : fffffa80`08b93c10 fffffa80`07ca8de0 fffffa80`07b402c0 00000000`00000000 : fltmgr!FltpPassThrough+0x2d9 fffff880`05f46500 fffff800`02da278e : fffffa80`07e5b550 fffffa80`07dfa8e0 fffffa80`07e5b550 fffffa80`07ca8de0 : fltmgr!FltpDispatch+0xb7 fffff880`05f46560 fffff800`02a918b4 : fffffa80`07e34010 fffff800`02d8f260 fffffa80`06d17c90 00000000`ff060001 : nt!IopDeleteFile+0x11e fffff880`05f465f0 fffff800`02d900e6 : fffff800`02d8f260 00000000`00000000 fffff880`05f469e0 fffffa80`08b93c10 : nt!ObfDereferenceObject+0xd4 fffff880`05f46650 fffff800`02d85e84 : fffffa80`07c3fcd0 00000000`00000000 fffffa80`07a17b10 fffffa80`0a31e701 : nt!IopParseDevice+0xe86 fffff880`05f467e0 fffff800`02d8ae4d : fffffa80`07a17b10 fffff880`05f46940 0067006e`00000040 fffffa80`06d17c90 : nt!ObpLookupObjectName+0x585 fffff880`05f468e0 fffff800`02d1ee3c : fffffa80`08cf07e0 00000000`00000007 fffffa80`00001f01 00001f80`00f40200 : nt!ObOpenObjectByName+0x1cd fffff880`05f46990 fffff800`02a8b993 : fffffa80`0a31e7e0 00000000`00000000 fffffa80`0a31e7e0 00000000`7ef95000 : nt!NtQueryFullAttributesFile+0x14f fffff880`05f46c20 00000000`77320eba : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : nt!KiSystemServiceCopyEnd+0x13 00000000`0121e778 00000000`00000000 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : 0x77320eba FOLLOWUP_IP: fltmgr!TreeUnlinkMulti+51 fffff880`0106f141 488b4620 mov rax,qword ptr [rsi+20h] SYMBOL_STACK_INDEX: 0 SYMBOL_NAME: fltmgr!TreeUnlinkMulti+51 FOLLOWUP_NAME: MachineOwner MODULE_NAME: fltmgr IMAGE_NAME: fltmgr.sys DEBUG_FLR_IMAGE_TIMESTAMP: 4a5bc11f STACK_COMMAND: .cxr 0xfffff88005f45960 ; kb FAILURE_BUCKET_ID: X64_0x3B_fltmgr!TreeUnlinkMulti+51 BUCKET_ID: X64_0x3B_fltmgr!TreeUnlinkMulti+51 Followup: MachineOwner ---------
Oddly, the first thing that jumped out at me is that I had seen a similar crash recently (we use crash dump examples in our kernel debugging seminar). This reminded me that it is not uncommon to see “patterns” like this as you look at crash dumps (particularly random crash dumps like this one). It isn’t exactly the same, however, but it is intriguingly similar. See the stack trace from that crash:
3: kd> k Child-SP RetAddr Call Site fffff880`06678208 fffff800`02afcb19 nt!KeBugCheckEx fffff880`06678210 fffff800`02a7cfee nt! ?? ::FNODOBFM::`string'+0x40edb fffff880`06678370 fffff800`02a624ae nt!KiPageFault+0x16e fffff880`06678500 fffff880`010f6373 nt!RtlDeleteNoSplay+0x2a fffff880`06678530 fffff880`010f2238 fltmgr!TreeUnlinkNoBalance+0x13 fffff880`06678560 fffff880`0111046f fltmgr!TreeUnlinkMulti+0x148 fffff880`066785b0 fffff880`01110dfe fltmgr!DeleteNameCacheNodes+0x9f fffff880`066785f0 fffff880`011202af fltmgr!PurgeStreamNameCache+0x8e fffff880`06678630 fffff880`01117a30 fltmgr!FltpPurgeVolumeNameCache+0x7f fffff880`06678670 fffff880`01110d4b fltmgr! ?? ::NNGAKEGL::`string'+0x1a04 fffff880`066786b0 fffff880`010f306a fltmgr!FltpReinstateNameCachingAllFrames+0x4b fffff880`066786e0 fffff800`02a81516 fltmgr!FltpPassThroughCompletion+0x8a fffff880`06678720 fffff880`01250f30 nt!IopfCompleteRequest+0x3a6 fffff880`06678800 fffff880`012dadfa Ntfs!NtfsExtendedCompleteRequestInternal+0x110 fffff880`06678840 fffff880`01249e0c Ntfs!NtfsCommonSetInformation+0xef1 fffff880`06678920 fffff880`010f023f Ntfs!NtfsFsdSetInformation+0x11c fffff880`066789a0 fffff880`010ee6df fltmgr!FltpLegacyProcessingAfterPreCallbacksCompleted+0x24f fffff880`06678a30 fffff800`02d5f49d fltmgr!FltpDispatch+0xcf fffff880`06678a90 fffff800`02a7e153 nt!NtSetInformationFile+0x909 fffff880`06678bb0 00000000`775f012a nt!KiSystemServiceCopyEnd+0x13
While not identical, what stuck out in my head is that they are both (essentially) in the same piece of logic, suggesting to me that there might be a data corruption or race condition issue going on here (although our original theory for the crash that we use in the seminar is that it is actually a single bit error).
Patterns of this type can (and have) brought insight in the past. So let’s see where this analysis takes me, and whether or not it matches anything from that previous analysis. One good point here is that I have some experience in walking the relevant data structures.
One other note about this particular machine: it’s not used for development; it’s used for video editing. As such, it isn’t running anything other than “stock” software that one might find on a typical production computer system.
In this particular case we seem to have a garden variety NULL pointer dereference:
fffff880`0106f141 488b4620 mov rax,qword ptr [rsi+20h] ds:002b:0000009d`00000020=????????????????
Thus, RSI is null in this case and the resulting user mode address is not valid. As typical in a case like this, we’ll walk backwards in the code stream to try and figure out where this value originated, since it is often the case we can glean some understanding of what this code is trying to do. See below for the code leading up to this point (via the u command):
fltmgr!TreeUnlinkMulti: fffff880`0106f0f0 fff3 push rbx fffff880`0106f0f2 55 push rbp fffff880`0106f0f3 57 push rdi fffff880`0106f0f4 4883ec30 sub rsp,30h fffff880`0106f0f8 33ff xor edi,edi fffff880`0106f0fa 488be9 mov rbp,rcx fffff880`0106f0fd 4883faff cmp rdx,0FFFFFFFFFFFFFFFFh fffff880`0106f101 0f840c010000 je fltmgr!TreeUnlinkMulti+0x123 (fffff880`0106f213) fffff880`0106f107 4c89642458 mov qword ptr [rsp+58h],r12 fffff880`0106f10c 4c8be2 mov r12,rdx fffff880`0106f10f 4983f8ff cmp r8,0FFFFFFFFFFFFFFFFh fffff880`0106f113 0f85eb450000 jne fltmgr! ?? ::FNODOBFM::`string'+0x504 (fffff880`01073704) fffff880`0106f119 4889742450 mov qword ptr [rsp+50h],rsi fffff880`0106f11e 488b31 mov rsi,qword ptr [rcx] fffff880`0106f121 4885f6 test rsi,rsi fffff880`0106f124 7518 jne fltmgr!TreeUnlinkMulti+0x4e (fffff880`0106f13e) fffff880`0106f126 488bdf mov rbx,rdi fffff880`0106f129 488b742450 mov rsi,qword ptr [rsp+50h] fffff880`0106f12e 488bc3 mov rax,rbx fffff880`0106f131 4c8b642458 mov r12,qword ptr [rsp+58h] fffff880`0106f136 4883c430 add rsp,30h fffff880`0106f13a 5f pop rdi fffff880`0106f13b 5d pop rbp fffff880`0106f13c 5b pop rbx fffff880`0106f13d c3 ret fffff880`0106f13e 488bdf mov rbx,rdi
Given that there is a ret instruction two lines earlier, we scan backwards for a jump or branch to that location. This type of code sequence is typical of an if/else style statement (where there is a return in one of the two code blocks):
fffff880`0106f124 7518 jne fltmgr!TreeUnlinkMulti+0x4e (fffff880`0106f13e)
At this point it certainly looks like there might be a logic bug here: the test instruction does a bitwise AND of the RSI register with itself:
fffff880`0106f121 4885f6 test rsi,rsi
…and then conditionally jumps based upon whether or not that value is zero:
fffff880`0106f124 7518 jne fltmgr!TreeUnlinkMulti+0x4e (fffff880`0106f13e)
Thus, this tests the value in RSI to see if it is zero. If it is, we jump down and use it, a condition that certainly does not seem to make much sense here. When I saw this it made me question if I was interpreting the instruction sequence correctly. Of course, a sanity check would be, “Well, if it weren’t zero, having it change to zero within the CPU would suggest a hardware problem of some sort.”
The encouraging insight here is that this certainly doesn’t look like it is related to the new memory added to the system.
Thus, the next logical step was to look and see, “So, where did the value in RSI originate?” That’s also present in this instruction stream:
fffff880`0106f11e 488b31 mov rsi,qword ptr [rcx]
So if we look at the value in RCX:
2: kd> dq @rcx l1 fffffa80`08e13318 fffff8a0`0d106d58
Lt is peculiar – this is not a zero value. This could be due to the fact that not all registers are saved, although the fact we see RCX is non-zero suggests that it is saved (values that are not saved on the x64 platform show as zero in the trap frame and context record).
Thus, this suggests a few possibilities:
- The value captured here is not correct;
- The value loaded in the RSI register is incorrect (ergo, a CPU problem of some sort);
- The value to which RCX points has changed since we captured it in RSI.
The hardware error theory seems rather unlikely, so let’s focus on the other two for a bit.
If we look at the data type of RCX we can see some useful information:
2: kd> !pool @rcx Pool page fffffa8008e13318 region is Nonpaged pool fffffa8008e13000 size: 150 previous size: 0 (Allocated) File (Protected) fffffa8008e13150 size: 30 previous size: 150 (Allocated) Io fffffa8008e13180 size: 50 previous size: 30 (Allocated) VadS fffffa8008e131d0 size: 80 previous size: 50 (Allocated) MmSd fffffa8008e13250 size: 20 previous size: 80 (Free) Io *fffffa8008e13270 size: c0 previous size: 20 (Allocated) *FMsl Pooltag FMsl : STREAM_LIST_CTRL structure, Binary : fltmgr.sys fffffa8008e13330 size: 150 previous size: c0 (Allocated) File (Protected) fffffa8008e13480 size: e0 previous size: 150 (Allocated) NV fffffa8008e13560 size: 160 previous size: e0 (Allocated) Ntfx fffffa8008e136c0 size: 160 previous size: 160 (Allocated) Ntfx fffffa8008e13820 size: 20 previous size: 160 (Free) FIPc fffffa8008e13840 size: 50 previous size: 20 (Allocated) VadS fffffa8008e13890 size: 160 previous size: 50 (Allocated) Ntfx fffffa8008e139f0 size: c0 previous size: 160 (Allocated) FMsl fffffa8008e13ab0 size: 160 previous size: c0 (Allocated) Ntfx fffffa8008e13c10 size: c0 previous size: 160 (Allocated) FMsl fffffa8008e13cd0 size: 10 previous size: c0 (Free) FIPc fffffa8008e13ce0 size: 60 previous size: 10 (Allocated) Io fffffa8008e13d40 size: 160 previous size: 60 (Allocated) Ntfx fffffa8008e13ea0 size: 160 previous size: 160 (Allocated) Ntfx
So this block of pool seems to include this structure. On x64 systems the pool header is 16 bytes long:
2: kd> dt _POOL_HEADER nt!_POOL_HEADER +0x000 PreviousSize : Pos 0, 8 Bits +0x000 PoolIndex : Pos 8, 8 Bits +0x000 BlockSize : Pos 16, 8 Bits +0x000 PoolType : Pos 24, 8 Bits +0x000 Ulong1 : Uint4B +0x004 PoolTag : Uint4B +0x008 ProcessBilled : Ptr64 _EPROCESS +0x008 AllocatorBackTraceIndex : Uint2B +0x00a PoolTagHash : Uint2B
To properly compute the start of the structure we need to take the start address of the pool block and add 0x10 to it (to account for the pool header):
2: kd> dt fltmgr!_STREAM_LIST_CTRL fffffa8008e13270+10 +0x000 Type : _FLT_TYPE +0x008 ContextCtrl : _FSRTL_PER_STREAM_CONTEXT +0x030 VolumeLink : _LIST_ENTRY [ 0xfffffa80`07043c60 - 0xfffffa80`08e03390 ] +0x040 Flags : 0x211 (No matching name) +0x044 UseCount : 4 +0x048 ContextLock : _EX_PUSH_LOCK +0x050 StreamContexts : _CONTEXT_LIST_CTRL +0x058 StreamHandleContexts : _CONTEXT_LIST_CTRL +0x060 NameCacheLock : _EX_PUSH_LOCK +0x068 LastRenameCompleted : _LARGE_INTEGER 0x0 +0x070 NormalizedNameCache : _NAME_CACHE_LIST_CTRL +0x080 ShortNameCache : _NAME_CACHE_LIST_CTRL +0x090 OpenedNameCache : _NAME_CACHE_LIST_CTRL +0x0a0 AllNameContextsTemporary : 0
And the value in RCX is an offset into this structure:
2: kd> ? @rcx-fffffa8008e13280 Evaluate expression: 152 = 00000000`00000098
Thus, this corresponds to something in the OpenedNameCache field. By dumping the _NAME_ CACHE_LIST_CTRL, we can see what exactly that is:
2: kd> dt _NAME_CACHE_LIST_CTRL fltmgr!_NAME_CACHE_LIST_CTRL +0x000 NameFormat : Uint4B +0x008 List : _TREE_ROOT
This in turn makes a bit of sense given the name of the function in which the fault occurs (ergo, we are manipulating a tree structure of some sort).
In expanding the _TREE_ROOT structure we see:
2: kd> dt fltmgr!_TREE_ROOT +0x000 Tree : Ptr64 _RTL_SPLAY_LINKS
Thus, I suspect that this is embedded in some larger data structure (and that structure is in turn linked into this tree). So let’s see what displaying the contents of the tree root tells us:
2: kd> dt _NAME_CACHE_LIST_CTRL @rcx-8 /b fltmgr!_NAME_CACHE_LIST_CTRL +0x000 NameFormat : 2 +0x008 List : _TREE_ROOT +0x000 Tree : 0xfffff8a0`0d106d58
We can then use this address (which, oddly enough is not zero, making this dump a bit unusual – after all, we just loaded this value into the RSI register and it was zero) to inspect this structure:
2: kd> !pool 0xfffff8a0`0d106d58 Pool page fffff8a00d106d58 region is Paged pool fffff8a00d106000 size: 130 previous size: 0 (Allocated) Ntfo fffff8a00d106130 size: 140 previous size: 130 (Allocated) MPsc fffff8a00d106270 size: 10 previous size: 140 (Free) .tFs fffff8a00d106280 size: 40 previous size: 10 (Allocated) NtFs fffff8a00d1062c0 size: 110 previous size: 40 (Allocated) ;oNm fffff8a00d1063d0 size: 190 previous size: 110 (Allocated) .Mfn fffff8a00d106560 size: 4d0 previous size: 190 (Allocated) .tff fffff8a00d106a30 size: 40 previous size: 4d0 (Allocated) NtFs fffff8a00d106a70 size: 180 previous size: 40 (Allocated) {Mfn fffff8a00d106bf0 size: c0 previous size: 180 (Allocated) oIcs fffff8a00d106cb0 size: 40 previous size: c0 (Allocated) MmSm fffff8a00d106cf0 size: 40 previous size: 40 (Allocated) .tFs *fffff8a00d106d30 size: 190 previous size: 40 (Allocated) *FMfn Pooltag FMfn : NAME_CACHE_NODE structure, Binary : fltmgr.sys fffff8a00d106ec0 size: 140 previous size: 190 (Allocated) APsc
Now let’s take this containing structure and dump it:
2: kd> dt fltmgr!_NAME_CACHE_NODE fffff8a00d106d40 +0x000 Type : _FLT_TYPE +0x008 ProvidingInstance : 0x00000081`00000000 _FLT_INSTANCE +0x010 CreationTime : _LARGE_INTEGER 0x3f`00067e77 +0x018 TreeLink : _TREE_NODE +0x050 NameInfo : _FLT_FILE_NAME_INFORMATION +0x0c8 UseCount : 1
To be honest, this structure looks a bit suspicious to me – the “creation time” doesn’t seem plausible, and that _FLT_INSTANCE address definitely does not look valid. So I decided to poke at the structure in a bit more detail:
2: kd> dt fltmgr!_NAME_CACHE_NODE fffff8a00d106d40 /b +0x000 Type : _FLT_TYPE +0x000 Signature : 0xf204 +0x002 Size : 0x176 +0x008 ProvidingInstance : 0x00000081`00000000 +0x010 CreationTime : _LARGE_INTEGER 0x3f`00067e77 +0x000 LowPart : 0x67e77 +0x004 HighPart : 63 +0x000 u :+0x000 LowPart : 0x67e77 +0x004 HighPart : 63 +0x000 QuadPart : 270583365239 +0x018 TreeLink : _TREE_NODE +0x000 Link : _RTL_SPLAY_LINKS +0x000 Parent : 0xfffff889`0d106d58 +0x008 LeftChild : 0x0000001d`00000000 +0x010 RightChild : 0xfffff835`0d17eb28 +0x018 TreeRoot : 0xfffffaff`08e13318 +0x020 Key1 : 0xfffffaf7`072f96b0 +0x028 Key2 : (null) +0x030 Flags : 0x14000 +0x050 NameInfo : _FLT_FILE_NAME_INFORMATION +0x000 Size : 0x78 +0x002 NamesParsed : 0 +0x004 Format : 2 +0x008 Name : _UNICODE_STRING "\Device\HarddiskVolume2\UsIrs\???" +0x000 Length : 0xa4 +0x002 MaximumLength : 0xa6 +0x008 Buffer : 0xfffff8a0`0d106e10 "\Device\HarddiskVolume2\UsIrs\???" +0x018 Volume : _UNICODE_STRING "\Device\HarddiskVolume2" +0x000 Length : 0x2e +0x002 MaximumLength : 0x2e +0x008 Buffer : 0xfffff8a0`0d106e10 "\Device\HarddiskVolume2" +0x028 Share : _UNICODE_STRING "--- memory read error at address 0x000000e8`00000000 ---" +0x000 Length : 0 +0x002 MaximumLength : 0 +0x008 Buffer : 0x000000e8`00000000 "--- memory read error at address 0x000000e8`00000000 ---" +0x038 Extension : _UNICODE_STRING "--- memory read error at address 0x000000f3`00000000 ---" +0x000 Length : 0 +0x002 MaximumLength : 0 +0x008 Buffer : 0x000000f3`00000000 "--- memory read error at address 0x000000f3`00000000 ---" +0x048 Stream : _UNICODE_STRING "--- memory read error at address 0x00000041`00000000 ---" +0x000 Length : 0 +0x002 MaximumLength : 0 +0x008 Buffer : 0x00000041`00000000 "--- memory read error at address 0x00000041`00000000 ---" +0x058 FinalComponent : _UNICODE_STRING "--- memory read error at address 0x000000d9`00000000 ---" +0x000 Length : 0 +0x002 MaximumLength : 0 +0x008 Buffer : 0x000000d9`00000000 "--- memory read error at address 0x000000d9`00000000 ---" +0x068 ParentDir : _UNICODE_STRING "" +0x000 Length : 0 +0x002 MaximumLength : 0 +0x008 Buffer : (null) +0x0c8 UseCount : 1
This looks somewhat valid to me, actually. The name for the volume looks to be properly set up, although several of the name components seem to be suspect – almost as if this structure is being initialized or torn down.
At this point I put together my working hypothesis: that there is a race condition present in this code somewhere, most likely in the initialization code. That still doesn’t explain what seems to be a logic issue here. I’m still left with more questions than answers – but having two crashes in the same general area, on two different machines, with two radically different usage profiles does suggest there is something interesting going on here.
Why do I propose a race condition here? Because I see information in the crash that is inconsistent – the contents of a register are NULL, but the memory location from which it was loaded indicates it should be non-zero. I cannot tell exactly when that data structure was added to this structure, but the choice seems to be “this is a CPU bug” or “this is a logic bug in this driver.” Of the two, in my experience the latter is far more likely than the former.
Hopefully, we’ll see more of these crashes so that we can find a pattern as to what is happening and broaden our analysis. It is also distinctly possible if we’re seeing this issue in “the real world” the filter manager team in Redmond has seen many more cases of this and have resolved the issue. We’ll be watching for more of these – perhaps you have a crash like this one you’d like to share with us.