OK, OK, I’ll admit it…Debugging crash dumps can get tedious fast. In a recent system hang that I analyzed, there were 1,200 threads in the system. Given no details on what was going on at the time of the hang, my eyes would probably start bleeding before I found the threads that were interesting. And that wasn’t even a big terminal server machine with lots of sessions running, so it was fairly tame by modern standards.
This is of course why we rely on our automated tools to do the heavy lifting for us. When the system crashes, we don’t go look up the bugcheck code and start trying to decode trap frames or context records on the stack, we instead rely on !analyze –v to do this work for us. In my 1,200 thread case, I didn’t bother taking a detailed look at every thread in the system but instead relied on !stacks 2 to give me a summary of threads so that I could quickly scan for something that looked “interesting.”
What I think we often lose sight of though is that these commands aren’t magic. At some point in time, someone thought that their job would be made easier if there was a command that would quickly provide summary analysis information at the time of a crash. Someone else thought sifting through the output of !process 0 7 was also far too painful and came up with a command to provide a summary view.
Why is it that we don’t all think this way? Instead, we tend to rely on the existing commands and then complain that they don’t work the way we’d like. Between WinDBG’s scripting capabilities and its debugger extension support it’s hard to say that there is anything it can’t do. The common responses to that though are usually, “the scripting language is too cryptic” or, “I don’t know how to write an extension.” But, in reality, any language is cryptic until you take the time to learn it and most of us didn’t know how to put pants on at some point in our lives. However, one day we decided that it was an important skill and decided to look at some examples and practice (if you still don’t know how to put pants on, I apologize and you are deemed exempt from the remainder of this perspective).
The other trap that we all fall into is that we just don’t allow ourselves the time to write a script or extension that will potentially save us hours down the road. When debugging a difficult problem, we tend to get tunnel vision and refuse to tear ourselves away from the problem to do something so frivolous. I definitely get dragged down into this one, and by the time I’ve figured the problem out the extension idea leaves my mind. Until I get the next dump of course, at which point I really wish I had written that extension…
So, I think it’s time for a regime change. Let’s promote debugging to first class citizenship and spend the time necessary to make our lives easier. Let’s declare August 2010 the Month of Debugging Smarter and start rethinking how we approach solving our debugging problems. See The Basics of Debugger Extensions and, to get your creative juices flowing, there’s even the source of a kernel mode implementation of !uniqstack for you to play with. This command scans all of the threads in the system and, when finished, provides the !thread output of the threads in the system with unique call stack sequences. As an example of how many threads this eliminates, my 1,200 thread system ended up only having 105 unique call chain sequences, which is certainly much more manageable.
So no more excuses about not knowing how to write your own extension! And, if you do write your own extension, email me at ap@osr.com to let me know about it. Hopefully I can gather submissions from The NT Insider readership and put together some interesting tidbits for our next issue.
Analyst’s Perspective is a column by OSR consulting associate, Scott Noone. When he’s not root-causing complex kernel issues he’s leading the development and instruction of OSR’s Kernel Debugging & Crash Analysis seminar. Comments on this article, or suggestions for a future submission can be addressed to ap@osr.com.