I had a program that was crashing at the customer site, but not in any way I could reproduce back at the office. I wasn’t able to load up a dev environment on the affected box, but someone had the bright suggestion of running Dr Watson (DrWtsn32.exe), and checking out the stack trace. As we all know, if you know your codebase well enough, you can almost explain a crash just by where it happened.
So I created an exe with debug symbols and optimizations removed, and waited for the crash. The DrWtsn32.log contained, amongst other things, the dump of the thread that caused the program crash:
*----> Stack Back Trace < ----* FramePtr ReturnAd Param#1 Param#2 Param#3 Param#4 Function Name 0100F9E8 00438172 01010460 0100FBE0 0100FB00 00D85698 !<nosymbols> 0100FAF4 0048DD45 01010460 0100FC54 0100FBFC 00D85698 !<nosymbols> 0100FBEC 004C3DBE 0106D028 00000114 0100FC60 0100FC64 !<nosymbols> 0100FC54 004C3C17 00D8AFC0 00000114 0100FD60 0100FD70 !<nosymbols> 0100FCDC 004CBDE9 010101F0 00000110 0100FDE0 0100FD70 !<nosymbols> 0100FD60 004C6FC5 010101F0 00000110 0100FE44 0100FDF0 !<nosymbols> 0100FDE0 004B63BD 010101F0 00000110 0100FEEC 0100FF00 !<nosymbols> 0100FE50 004B61B6 010101F0 00000110 0100FF7C 0100FF00 !<nosymbols> 0100FEEC 004B5C9C 010101F0 00000110 00000001 000001FB !<nosymbols> 0100FF7C 1020BFD2 00D84058 000001FB 00130178 00D85698 !<nosymbols> 0100FFB4 77E8B2D8 00D85698 000001FB 00130178 00D85698 !beginthreadex 0100FFEC 00000000 1020BF20 00D85698 00000000 00000008 kernel32!lstrcmpiW
Which added nothing to my understanding of what went wrong, beyond it happened at some point after the thread started. Great. Did I mention I’d done a debug build – the kind with all the symbols compiled in? Anyway, I figured switching to the binary output (which I understand is like a unix core file) may provide further explaination.
The format of the binary output from Dr Watson is lost in the mists of time. I had to go to an NT4 server install CD (you have to love the pack-rats who keep the German Server version of NT4 ©1999!) to locate the two necessary files for working with the user.dmp file it generates. The files are DUMPCHK.EXE and DUMPEXAM.EXE.
Running DUMPCHK.EXE gives a little output
C:\...\RetailGateway>dumpchk user.dmp Filename . . . . . . .user.dmp Signature. . . . . . .USER ValidDump. . . . . . .DUMP MajorVersion . . . . .5 MinorVersion . . . . .0 DirectoryTableBase . .0x0000014c PfnDataBase. . . . . .0x00000004 PsLoadedModuleList . .0x00000018 PsActiveProcessHead. .0x0000009a MachineImageType . . .NumberProcessors . . .7528 BugCheckCode . . . . .0x00001f65 BugCheckParameter1 . .0x000000a0 BugCheckParameter2 . .0x00000040 BugCheckParameter3 . .0x00001ca8 BugCheckParameter4 . .0x00d6df65 ExceptionCode. . . . .0xc0000005 ExceptionFlags . . . .0x00000000 ExceptionAddress . . .0x0045b4a9 ExceptionParam#0 . .0x00000000 ExceptionParam#0 . .0x00000000
At which point I got this dialog:
“Oh well,” I thought, “I’ll just head on straight to examining the dump, rather that checking it.” Unfortunately, DUMPEXAM.EXE wasn’t on my side. All I got was:
C:\...\RetailGateway>dumpexam user.dmp unsupported processor type
Which is useless. Which is the sum value of the binary dump from Dr Watson. And, in this case, all of the output from Dr Watson.
Why don’t you try that utility Simon Bullen wrote? Um… hang on Simon Bullen… Clockwork Universe… Fortify!!
Here it its. Have you used it before? It’s a fantastic memory allocation tracking/issue diagnosis tool.
That’s looking pretty cool, like a cut down version of Purify. Unfortunately, the root cause of this particular problem was dereferencing a null pointer. Which Fortify is no help with. And it got to be null because of a public class member variable, which is often pointed to be the work of the devil. So, the solution is to make the member private and create accessors, which would make it obvious that the member could be changed. And Fortify is no help there either.
Yep, true. Fortify couldn’t have saved you there. It does save a *lot* of other situations so I’d still recommend using it in most projects. I’ve found memory problems that I wasn’t even looking for in programs that run successfully.