views:

345

answers:

3

I've used minidumps on many game projects over the years and they seem to have about a 50% chance of having a valid call stack. What can I do to make them have better call stacks?

I've tried putting the latest dbghelp.dll in the exe directory. That seems to help some.

Is Visual Studio 2008 or 2010 any better? (I'm still on VS 2005).

The code I use looks like this sample.

+5  A: 

What's missing from your callstack? Do you have a bunch of addresses that don't resolve to valid function names (ie, 0x8732ae00 instead of CFoo:Bar())? If so, then what you need is to put your .PDBs where your debugger can find them, or set up a symbol server and set the "Symbol Paths" in the right-click context menu of the Modules pane.

We store every .PDB from every binary every time someone checks in a new Perforce changelist, so that when a dump comes back from anyone inside the office or any customer at retail, we have the .PDB corresponding to the version of the game they were running. With the symbol server and paths set, all I have to do is just double-click the .mdmp and it works every time.

Or do you have a call stack that appears to only have one function in it? Like, 0x8538cf00 without anything else above it in the stack? If so, then your crash is actually the stack itself being corrupted. If the return addresses in the backchain have been overwritten, naturally the debugger will be unable to resolve them.

Sometimes also you'll find that the thread that actually emits the minidump is not the one that threw the exception that caused the crash. Look in the Threads window to see if one of the other threads has the offending code in it.

If you are debugging a "Release" build -- that is to say, one compiled with all optimization flags turned on -- you will have to live with the fact that the debugger will have trouble finding local variables and some other data. This is because turning on optimizations means allowing the compiler to keep data on registers, collapse calculations, and generally do a variety of things that prevents data from ever actually being written to the stack. If this is your problem then you'll need to open up the disassembly window and chase the data by hand, or rebuild a debug binary and reproduce the problem where you can look at it.

Crashworks
0x8732ae00 is an unlikely address, it's in kernel space (with the 2GB setup of x86-32). 0x7_______ addresses are more common, because the Windows DLL hug against the 2GB boundary. This reduces the number of relocations needed. If you don't see symbols for them, use the _Microsoft_ Symbol Server.
MSalters
I was just pulling addresses at random for example (in this case that's where a particular set-top console likes to relocate the user-mode DLLs).
Crashworks
A: 

I dont use minidumps, but rather dump teh stack by "hand" into a logfile (see www.ddj.com/cpp/185300443 and http://stackoverflow.com/questions/590160/how-to-log-stack-frames-with-windows-x64).

I encounter a similar behavior like you do: Sometimes there is a valid call stack, sometimes there is not. In a minor number of cases the stack might be really corrupted. In maybe 1/3 of all cases the installed Exception handler is not called at all! I guess that its somehow a problem of the windows structured exception handling.

RED SOFT ADAIR
+2  A: 

Turn off Frame Pointer Optimization, if you need stack dumps. Frame pointers are used to explicitly define stack frames. Without them, the debugger has to deduce the location of each frame.

MSalters
This is a good idea. With PDBs and the original DLLs, though, MSVC's debugger can work out the stack frames with FPO anyway, but of course its job becomes that much harder. I know this because we compile with FPO and I get stacks out of minidumps all the time.
Crashworks
It's of course easy if the "crash" is due to a manual INT 3 breakpoint. Problem is, most crashes don't happen at exactly the buggy instruction. The CPU stumbles on for a while, until a fault is triggered. In the mean time, the code executed isn't working as intended and may corrupt the programs state quite a bit. This may include executing instructions you didn't intend to be executed (particular nastiness: indirect jumps via a misinterpreted vtable). vtablecourse
MSalters
Yeah, the absence of frame pointers definitely makes the task of fishing back through the stack by hand much harder. Even if the program died by jumping through a wild vfunc pointer, you can usually figure out where it came from because the CALL op pushes IP onto the stack, but finding it and then working out where all the locals have gone can become an arduous exercise in working backwards one op at a time. If you do find yourself up this creek, windbg has the helpful `dps` command which searches memory for likely known symbols and function addresses; that can help you hunt for the old EIP.
Crashworks