In the last issue of The NT Insider we published a write-up describing analysis of a crash dump in filter manager. A couple of readers commented about the analysis and had some solid points that needed to be considered.
This also underscores an important aspect of analysis, namely that it can be helpful to obtain a second opinion on one’s analysis, precisely because it is easy to miss some important point that aids in the analysis.
Now on to the specific issues raised by the readers:
- The value of the RSI register is in fact not null (0000009d00000000)
- There is a backwards jump in the instruction stream that impacts on the code flow analysis
- There is some useful information related to determining the type of trap frame that in turn tells us more about which registers are valid.
x64 Trap Frame Observations
On the x64 platform, a kernel trap frame does not capture all register state:
- For a processor exception or trap, the OS only captures the volatile registers (RAX, RCX, RDX, R8-R11 and XMM0-XMM5) and the RBP register.
- For a system call entry, the OS only captures RBP, RSI and RDI. No other registers are preserved.
By using the ExceptionActive field, we can determine which type of trap frame this is (a 0 or 1 value indicates this is an exception or trap and the volatile registers plus RBP are stored).
Further, the reader observed:
You can very often get reliable nonvolatile registers on x64 from “.frame /r (frame number)”. There is also the newly-documented (but long-present) “.frame /c (frame number)” command that sets your effective context to the values obtained from .frame /r. This works using the unwind metadata generated by the x64 compiler that the debugger stackwalker uses (it also works for Itanium, if you should be debugging that, but not x86). It should _always_ give you correct nonvolatile registers if you start from the context obtained by .cxr, .thread, or .exptr.
This was useful information in general, and hopefully will help our readers further hone their debugging skills into the future.
Code flow analysis
This reader had some valid observations here:
The first constructive point is that if your debugging takes you into the game of back-tracing, you need to study whole functions. This means unassembling not just at addresses before the faulting instruction, nor after, but at all places that fragments of the function have got scattered by optimisation. Basically, you need to raise your debugging to the foothills of reverse engineering. The reverse engineer will see that the faulting instruction, “mov rax,qword ptr [rsi+20h]” at …F141 is picked up for TreeUnlinkMulti by inlining TreeUnlinkMultiDoWalk, which in turn inlines TreeLookup, which in turn inlines TreeFindNodeOrParent. The loop that the analyst has missed is actually from the start of this last subroutine. The code’s overall intention is to walk a given tree, remove the nodes that match a given pair of keys, and return these nodes as a list (linked through the RightChild members only).
The reality is that with x64 it is proving to be far more often the case that we need to back track through the code flow in order to find local variables and reconstruct the stack. When doing a thorough analysis, it is indeed important to look at the entire function (the uf function is good for this) but this is a bit more time-consuming (and certainly more daunting to those just approaching kernel debugging).
But these are valid points.
So with this said, let’s go back and revisit our analysis. The context record shows us:
2: kd> .cxr fffff88005f45960 rax=fffffaf7072f96b0 rbx=0000000000000000 rcx=fffffa8008e13318 rdx=fffffa8007e5b550 rsi=0000009d00000000 rdi=0000000000000000 rip=fffff8800106f141 rsp=fffff88005f46330 rbp=fffffa8008e13318 r8=ffffffffffffffff r9=ffffffffffffffff r10=fffffffffffffe4a r11=0000000000000001 r12=fffffa8007e5b550 r13=fffffa8007e34684 r14=0000000000004000 r15=0000000000000000 iopl=0 nv up ei pl nz na po nc cs=0010 ss=0018 ds=002b es=002b fs=0053 gs=002b efl=00010206 fltmgr!TreeUnlinkMulti+0x51: fffff880`0106f141 488b4620 mov rax,qword ptr [rsi+20h] ds:002b:0000009d`00000020=????????????????
And of course the registers are valid here (they are all captured in the context record,) as our reader noted.
This gives us the following stack:
2: kd> kv *** Stack trace for last set context - .thread/.cxr resets it Child-SP RetAddr : Args to Child : Call Site fffff880`05f46330 fffff880`0106c460 : fffffa80`07a96920 fffffa80`07e5b550 fffffa80`07a96920 00000000`00000000 : fltmgr!TreeUnlinkMulti+0x51 fffff880`05f46380 fffff880`0106cbe9 : fffff880`05f48000 00000000`00000002 00000000`00000000 00000000`00000000 : fltmgr!FltpPerformPreCallbacks+0x730 fffff880`05f46480 fffff880`0106b6c7 : fffffa80`08b93c10 fffffa80`07ca8de0 fffffa80`07b402c0 00000000`00000000 : fltmgr!FltpPassThrough+0x2d9 fffff880`05f46500 fffff800`02da278e : fffffa80`07e5b550 fffffa80`07dfa8e0 fffffa80`07e5b550 fffffa80`07ca8de0 : fltmgr!FltpDispatch+0xb7 fffff880`05f46560 fffff800`02a918b4 : fffffa80`07e34010 fffff800`02d8f260 fffffa80`06d17c90 00000000`ff060001 : nt!IopDeleteFile+0x11e fffff880`05f465f0 fffff800`02d900e6 : fffff800`02d8f260 00000000`00000000 fffff880`05f469e0 fffffa80`08b93c10 : nt!ObfDereferenceObject+0xd4 fffff880`05f46650 fffff800`02d85e84 : fffffa80`07c3fcd0 00000000`00000000 fffffa80`07a17b10 fffffa80`0a31e701 : nt!IopParseDevice+0xe86 fffff880`05f467e0 fffff800`02d8ae4d : fffffa80`07a17b10 fffff880`05f46940 0067006e`00000040 fffffa80`06d17c90 : nt!ObpLookupObjectName+0x585 fffff880`05f468e0 fffff800`02d1ee3c : fffffa80`08cf07e0 00000000`00000007 fffffa80`00001f01 00001f80`00f40200 : nt!ObOpenObjectByName+0x1cd fffff880`05f46990 fffff800`02a8b993 : fffffa80`0a31e7e0 00000000`00000000 fffffa80`0a31e7e0 00000000`7ef95000 : nt!NtQueryFullAttributesFile+0x14f fffff880`05f46c20 00000000`77320eba : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : nt!KiSystemServiceCopyEnd+0x13 (TrapFrame @ fffff880`05f46c20) 00000000`0121e778 00000000`00000000 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : 0x77320eba
Then let’s look at the invalid address:
2: kd> !pte 0000009d00000000 VA 0000009d00000000 PXE at FFFFF6FB7DBED008 PPE at FFFFF6FB7DA013A0 PDE at FFFFF6FB40274000 PTE at FFFFF6804E800000 contains 0000000000000000 not valid
This decodes the address, finds the relevant page table entries and decodes each of them. From this, we can tell there is nothing within this 512GB memory region (since each PXE entry corresponds to a 512GB region of the address space).
Thus, while not the null pointer indicated previously, this is still an invalid address – within a large, undefined region of the address space.
As you so choose, you can look at the first function from the stack in its entirety, we see:
2: kd> uf fltmgr!TreeUnlinkMulti fltmgr!TreeUnlinkMulti: fffff880`0106f0f0 fff3 push rbx fffff880`0106f0f2 55 push rbp fffff880`0106f0f3 57 push rdi fffff880`0106f0f4 4883ec30 sub rsp,30h fffff880`0106f0f8 33ff xor edi,edi fffff880`0106f0fa 488be9 mov rbp,rcx fffff880`0106f0fd 4883faff cmp rdx,0FFFFFFFFFFFFFFFFh fffff880`0106f101 0f840c010000 je fltmgr!TreeUnlinkMulti+0x123 (fffff880`0106f213) fltmgr!TreeUnlinkMulti+0x17: fffff880`0106f107 4c89642458 mov qword ptr [rsp+58h],r12 fffff880`0106f10c 4c8be2 mov r12,rdx fffff880`0106f10f 4983f8ff cmp r8,0FFFFFFFFFFFFFFFFh fffff880`0106f113 0f85eb450000 jne fltmgr! ?? ::FNODOBFM::`string'+0x504 (fffff880`01073704) fltmgr!TreeUnlinkMulti+0x29: fffff880`0106f119 4889742450 mov qword ptr [rsp+50h],rsi fffff880`0106f11e 488b31 mov rsi,qword ptr [rcx] fffff880`0106f121 4885f6 test rsi,rsi fffff880`0106f124 7518 jne fltmgr!TreeUnlinkMulti+0x4e (fffff880`0106f13e) fltmgr!TreeUnlinkMulti+0x36: fffff880`0106f126 488bdf mov rbx,rdi fltmgr!TreeUnlinkMulti+0x39: fffff880`0106f129 488b742450 mov rsi,qword ptr [rsp+50h] fffff880`0106f12e 488bc3 mov rax,rbx fltmgr!TreeUnlinkMulti+0x41: fffff880`0106f131 4c8b642458 mov r12,qword ptr [rsp+58h] fltmgr!TreeUnlinkMulti+0x46: fffff880`0106f136 4883c430 add rsp,30h fffff880`0106f13a 5f pop rdi fffff880`0106f13b 5d pop rbp fffff880`0106f13c 5b pop rbx fffff880`0106f13d c3 ret fltmgr!TreeUnlinkMulti+0x4e: fffff880`0106f13e 488bdf mov rbx,rdi fltmgr!TreeUnlinkMulti+0x51: fffff880`0106f141 488b4620 mov rax,qword ptr [rsi+20h] fffff880`0106f145 483bd0 cmp rdx,rax fffff880`0106f148 741b je fltmgr!TreeUnlinkMulti+0x75 (fffff880`0106f165) fltmgr!TreeUnlinkMulti+0x5a: fffff880`0106f14a 483bd0 cmp rdx,rax fffff880`0106f14d 720b jb fltmgr!TreeUnlinkMulti+0x6a (fffff880`0106f15a) fltmgr!TreeUnlinkMulti+0x5f: fffff880`0106f14f 488b7610 mov rsi,qword ptr [rsi+10h] fffff880`0106f153 4885f6 test rsi,rsi fffff880`0106f156 74ce je fltmgr!TreeUnlinkMulti+0x36 (fffff880`0106f126) fltmgr!TreeUnlinkMulti+0x68: fffff880`0106f158 ebe7 jmp fltmgr!TreeUnlinkMulti+0x51 (fffff880`0106f141) fltmgr!TreeUnlinkMulti+0x6a: fffff880`0106f15a 488b7608 mov rsi,qword ptr [rsi+8] fffff880`0106f15e 4885f6 test rsi,rsi fffff880`0106f161 74c3 je fltmgr!TreeUnlinkMulti+0x36 (fffff880`0106f126) fltmgr!TreeUnlinkMulti+0x73: fffff880`0106f163 ebdc jmp fltmgr!TreeUnlinkMulti+0x51 (fffff880`0106f141) fltmgr!TreeUnlinkMulti+0x75: fffff880`0106f165 4c896c2460 mov qword ptr [rsp+60h],r13 fffff880`0106f16a 4c89742428 mov qword ptr [rsp+28h],r14 fffff880`0106f16f 4c897c2420 mov qword ptr [rsp+20h],r15 fffff880`0106f174 4c8bee mov r13,rsi fltmgr!TreeUnlinkMulti+0x87: fffff880`0106f177 4c396620 cmp qword ptr [rsi+20h],r12 fffff880`0106f17b 0f854e450000 jne fltmgr! ?? ::FNODOBFM::`string'+0x4cf (fffff880`010736cf) fltmgr!TreeUnlinkMulti+0x91: fffff880`0106f181 4c8bf6 mov r14,rsi fffff880`0106f184 4c3bee cmp r13,rsi fffff880`0106f187 0f8532450000 jne fltmgr! ?? ::FNODOBFM::`string'+0x4bf (fffff880`010736bf) fltmgr!TreeUnlinkMulti+0x9d: fffff880`0106f18d 41b701 mov r15b,1 fltmgr!TreeUnlinkMulti+0xa0: fffff880`0106f190 48397500 cmp qword ptr [rbp],rsi fffff880`0106f194 753c jne fltmgr!TreeUnlinkMulti+0xe2 (fffff880`0106f1d2) fltmgr!TreeUnlinkMulti+0xa6: fffff880`0106f196 488b5618 mov rdx,qword ptr [rsi+18h] fffff880`0106f19a 488bce mov rcx,rsi fffff880`0106f19d ff15b5e50000 call qword ptr [fltmgr!_imp_RtlDeleteNoSplay (fffff880`0107d758)] fffff880`0106f1a3 f0816630fffffeff lock and dword ptr [rsi+30h],0FFFEFFFFh fffff880`0106f1ab 488b7500 mov rsi,qword ptr [rbp] fffff880`0106f1af 49895e10 mov qword ptr [r14+10h],rbx fffff880`0106f1b3 498bde mov rbx,r14 fffff880`0106f1b6 4c8bee mov r13,rsi fltmgr!TreeUnlinkMulti+0xc9: fffff880`0106f1b9 4885f6 test rsi,rsi fffff880`0106f1bc 75b9 jne fltmgr!TreeUnlinkMulti+0x87 (fffff880`0106f177) fltmgr!TreeUnlinkMulti+0xce: fffff880`0106f1be 4c8b7c2420 mov r15,qword ptr [rsp+20h] fffff880`0106f1c3 4c8b742428 mov r14,qword ptr [rsp+28h] fffff880`0106f1c8 4c8b6c2460 mov r13,qword ptr [rsp+60h] fffff880`0106f1cd e957ffffff jmp fltmgr!TreeUnlinkMulti+0x39 (fffff880`0106f129) fltmgr!TreeUnlinkMulti+0xe2: fffff880`0106f1d2 488b5618 mov rdx,qword ptr [rsi+18h] fffff880`0106f1d6 488bce mov rcx,rsi fffff880`0106f1d9 ff1579e50000 call qword ptr [fltmgr!_imp_RtlDeleteNoSplay (fffff880`0107d758)] fffff880`0106f1df f0816630fffffeff lock and dword ptr [rsi+30h],0FFFEFFFFh fffff880`0106f1e7 48895e10 mov qword ptr [rsi+10h],rbx fffff880`0106f1eb 4983c8ff or r8,0FFFFFFFFFFFFFFFFh fffff880`0106f1ef 498bd4 mov rdx,r12 fffff880`0106f1f2 488bcd mov rcx,rbp fffff880`0106f1f5 488bde mov rbx,rsi fffff880`0106f1f8 e8b3190000 call fltmgr!TreeLookup (fffff880`01070bb0) fffff880`0106f1fd 4885c0 test rax,rax fffff880`0106f200 0f85c1440000 jne fltmgr! ?? ::FNODOBFM::`string'+0x4c7 (fffff880`010736c7) fltmgr!TreeUnlinkMulti+0x116: fffff880`0106f206 488bf7 mov rsi,rdi fltmgr!TreeUnlinkMulti+0x119: fffff880`0106f209 4584ff test r15b,r15b fffff880`0106f20c 74ab je fltmgr!TreeUnlinkMulti+0xc9 (fffff880`0106f1b9) fltmgr!TreeUnlinkMulti+0x11e: fffff880`0106f20e 4c8bee mov r13,rsi fffff880`0106f211 eba6 jmp fltmgr!TreeUnlinkMulti+0xc9 (fffff880`0106f1b9) fltmgr!TreeUnlinkMulti+0x123: fffff880`0106f213 4983f8ff cmp r8,0FFFFFFFFFFFFFFFFh fffff880`0106f217 0f8593440000 jne fltmgr! ?? ::FNODOBFM::`string'+0x4b0 (fffff880`010736b0) fltmgr!TreeUnlinkMulti+0x12d: fffff880`0106f21d 488b19 mov rbx,qword ptr [rcx] fffff880`0106f220 4885db test rbx,rbx fffff880`0106f223 750b jne fltmgr!TreeUnlinkMulti+0x140 (fffff880`0106f230) fltmgr!TreeUnlinkMulti+0x135: fffff880`0106f225 488bc7 mov rax,rdi fffff880`0106f228 4883c430 add rsp,30h fffff880`0106f22c 5f pop rdi fffff880`0106f22d 5d pop rbp fffff880`0106f22e 5b pop rbx fffff880`0106f22f c3 ret fltmgr!TreeUnlinkMulti+0x140: fffff880`0106f230 488bcb mov rcx,rbx fffff880`0106f233 e828410000 call fltmgr!TreeUnlinkNoBalance (fffff880`01073360) fffff880`0106f238 48897b10 mov qword ptr [rbx+10h],rdi fffff880`0106f23c 488bfb mov rdi,rbx fffff880`0106f23f 488b5d00 mov rbx,qword ptr [rbp] fffff880`0106f243 4885db test rbx,rbx fffff880`0106f246 74dd je fltmgr!TreeUnlinkMulti+0x135 (fffff880`0106f225) fltmgr!TreeUnlinkMulti+0x158: fffff880`0106f248 ebe6 jmp fltmgr!TreeUnlinkMulti+0x140 (fffff880`0106f230) fltmgr! ?? ::FNODOBFM::`string'+0x4b0: fffff880`010736b0 4883caff or rdx,0FFFFFFFFFFFFFFFFh fffff880`010736b4 e8c72e0000 call fltmgr!TreeUnlinkMultiDoWalk (fffff880`01076580) fffff880`010736b9 90 nop fffff880`010736ba e977baffff jmp fltmgr!TreeUnlinkMulti+0x46 (fffff880`0106f136) fltmgr! ?? ::FNODOBFM::`string'+0x4bf: fffff880`010736bf 4532ff xor r15b,r15b fffff880`010736c2 e9c9baffff jmp fltmgr!TreeUnlinkMulti+0xa0 (fffff880`0106f190) fltmgr! ?? ::FNODOBFM::`string'+0x4c7: fffff880`010736c7 488bf0 mov rsi,rax fffff880`010736ca e93abbffff jmp fltmgr!TreeUnlinkMulti+0x119 (fffff880`0106f209) fltmgr! ?? ::FNODOBFM::`string'+0x4cf: fffff880`010736cf 488b4608 mov rax,qword ptr [rsi+8] fffff880`010736d3 4885c0 test rax,rax fffff880`010736d6 7408 je fltmgr! ?? ::FNODOBFM::`string'+0x4e0 (fffff880`010736e0) fltmgr! ?? ::FNODOBFM::`string'+0x4d8: fffff880`010736d8 488bf0 mov rsi,rax fffff880`010736db e9d9baffff jmp fltmgr!TreeUnlinkMulti+0xc9 (fffff880`0106f1b9) fltmgr! ?? ::FNODOBFM::`string'+0x4e0: fffff880`010736e0 488b4610 mov rax,qword ptr [rsi+10h] fffff880`010736e4 4885c0 test rax,rax fffff880`010736e7 7408 je fltmgr! ?? ::FNODOBFM::`string'+0x4f1 (fffff880`010736f1) fltmgr! ?? ::FNODOBFM::`string'+0x4e9: fffff880`010736e9 488bf0 mov rsi,rax fffff880`010736ec e9c8baffff jmp fltmgr!TreeUnlinkMulti+0xc9 (fffff880`0106f1b9) fltmgr! ?? ::FNODOBFM::`string'+0x4f1: fffff880`010736f1 498bd5 mov rdx,r13 fffff880`010736f4 488bce mov rcx,rsi fffff880`010736f7 e8941b0000 call fltmgr!FindNextRightSubtree (fffff880`01075290) fffff880`010736fc 488bf0 mov rsi,rax fffff880`010736ff e9b5baffff jmp fltmgr!TreeUnlinkMulti+0xc9 (fffff880`0106f1b9) fltmgr! ?? ::FNODOBFM::`string'+0x504: fffff880`01073704 e8a7d4ffff call fltmgr!TreeLookup (fffff880`01070bb0) fffff880`01073709 4885c0 test rax,rax fffff880`0107370c 7417 je fltmgr! ?? ::FNODOBFM::`string'+0x525 (fffff880`01073725) fltmgr! ?? ::FNODOBFM::`string'+0x50e: fffff880`0107370e 488bc8 mov rcx,rax fffff880`01073711 488bd8 mov rbx,rax fffff880`01073714 e8471b0000 call fltmgr!TreeUnlink (fffff880`01075260) fffff880`01073719 48897b10 mov qword ptr [rbx+10h],rdi fffff880`0107371d 488bc3 mov rax,rbx fffff880`01073720 e90cbaffff jmp fltmgr!TreeUnlinkMulti+0x41 (fffff880`0106f131) fltmgr! ?? ::FNODOBFM::`string'+0x525: fffff880`01073725 488bc7 mov rax,rdi fffff880`01073728 e904baffff jmp fltmgr!TreeUnlinkMulti+0x41 (fffff880`0106f131)
The current block starts at:
fltmgr!TreeUnlinkMulti+0x4e: fffff880`0106f13e 488bdf mov rbx,rdi
The mistake in the earlier analysis was to miss the jump backwards several instructions afterwards:
fffff880`0106f158 ebe7 jmp fltmgr!TreeUnlinkMulti+0x51 (fffff880`0106f141)
Thus, we really do have a small block of code under analysis, as shown below:
fltmgr!TreeUnlinkMulti+0x4e: fffff880`0106f13e 488bdf mov rbx,rdi fltmgr!TreeUnlinkMulti+0x51: fffff880`0106f141 488b4620 mov rax,qword ptr [rsi+20h] fffff880`0106f145 483bd0 cmp rdx,rax fffff880`0106f148 741b je fltmgr!TreeUnlinkMulti+0x75 (fffff880`0106f165) fltmgr!TreeUnlinkMulti+0x5a: fffff880`0106f14a 483bd0 cmp rdx,rax fffff880`0106f14d 720b jb fltmgr!TreeUnlinkMulti+0x6a (fffff880`0106f15a) fltmgr!TreeUnlinkMulti+0x5f: fffff880`0106f14f 488b7610 mov rsi,qword ptr [rsi+10h] fffff880`0106f153 4885f6 test rsi,rsi fffff880`0106f156 74ce je fltmgr!TreeUnlinkMulti+0x36 (fffff880`0106f126) fltmgr!TreeUnlinkMulti+0x68: fffff880`0106f158 ebe7 jmp fltmgr!TreeUnlinkMulti+0x51 (fffff880`0106f141)
The reader that pointed out the loop here also pointed out the intent of this code fragment:
The code’s overall intention is to walk a given tree, remove the nodes that match a given pair of keys, and return these nodes as a list (linked through the RightChild members only).
The analyst has identified the given tree and has in Figure 7 dumped for us the root node, as the TreeLink member of a _NAME_CACHE_NODE. See there that the LeftChild member is corrupt but not with the value of RSI at the time of the fault. Execution will have worked some distance into the RightChild subtree until reaching a node that has the faulting RSI as either its LeftChild or RightChild member. Most plausibly, this tree was already corrupt when TreeUnlinkMulti was entered. A race condition, whether inside TreeUnlinkMulti or out, is just one of many ways that links in a tree might get corrupted.
Thus, at this point we’re pretty much at a similar conclusion of the analysis: we have a data corruption; it doesn’t seem likely the corruption occurred here but it is clear there is a data corruption.
As noted previously, we’ve seen similar data corruption – on a different computer system, but on Windows 7 x64. In the first dump, we observed what appears to be a single bit error in memory. By itself it led us to suspect the machine. Seeing this on a different computer in similar circumstances makes us suspect there is some source of data corruption in the code. While a race condition is a potential data corruption source, it’s not the onlypossibility.
Data corruption issues are often the most difficult to track down. Frequently the source of the corruption shows up from a pattern that materializes after reviewing a number of crash dumps, not a single crash dump. While we still do not know the actual issue here, we’ll be on the look-out for it in the future and invite our readers to share their own observations if they see it as well.