views:

55

answers:

4

I know I'm reaching for straws here, but this one is a mystery... any pointers or help would be most welcome, so I'm appealing to those more intelligent than I:

We have a crash exhibited in our release binaries only. The crash takes place as the binary is bringing itself down and terminating sub-libraries upon which it depends. Its ability to be reproduced is dependent on the machine- some are 100% reliable in reproducing the crash, some don't exhibit the issue at all, and some are in between. The crash is deep within one of the sublibraries, and there is a good likelihood the stack is corrupt by the time the rubble can be brought into a debugger (MSVC 2008 SP1) to be examined. Running the binary under the debugger prevents the bug from happening, as does remote debugging, as does (of all things) connecting to the machine via VNC. We have tried to install the Microsoft Driver Development Kit, and doing so also squelches the bug.

What would be the next best place to look? What tools would be best in this circumstance? Does it sound like a race condition, or something else?

+1  A: 

Have you tried Rational Purify? I've used this (some 4-5 years ago). Then it was helpful in tracking down memory bugs, stack corruption, invalid handles etc.

mdma
+1  A: 

Try AppVerifier and GFlags together to find Page Heap corruption.

You'll likely need WinDbg as your debugger instead of Visual Studio to debug.

I also recommend this book on advanced Windows debugging for tracking down crashes such as the one you are hitting.

selbie
+1  A: 

Are you using the threadpool by any chance and not cancelling or waiting for outstanding job objects to complete?

Alienfluid
A: 

The problem was a conflicting setting of the pernicious _SECURE_SCL flag under Visual Studio, causing silent ABI incompatibilities between the DLL and one of its dependencies.

fbrereto