How can I find a rare bug that seems to only occur in release builds?

views:

160

answers:

How can I find a rare bug that seems to only occur in release builds?

I have a fairly large solution that occasionally crashes. Sadly, these crashes appear to only occur in release build. When I attach the debugger upon crashing, I get the message:

"No symbols are loaded for any call stack frame. The source code cannot be displayed"

This makes it quite hard to find the cause of the crashes. I am using the default release build settings of visual studio 2008, in which 'debug information format' is set to 'Program Database (/Zi)'.

Do you have any tips that might help me find the bug? For example, could I change some settings in my projects so that the crashes might still occur but get more meaningful information in the debugger?

Update: The problem was a very rarely occurring logic error that in itself should not cause a crash, but apparently caused a crash elsewhere. Solving the logic error solved the crashing behavior.

To anyone that came here looking for a resolution of a similar problem: best of luck, you're in for a rough ride. What eventually helped me locate the problem was adding a lot of bounds checks in the code (that I could enable/disable with preprocessor directives) and compiling for linux and running with gdb/valgrind.

+5 A:

Richard 2010-07-15 10:58:19

+4 A:

A few reasons why a debug build might not allow a defect to express itself:

Some debug configurations initialize all variables.
Debug memory allocations and deallocations might be more forgiving of pointer abuse.
The debug build might execute at a different speed thus masking a race condition.

Since you are using C++ you might consider using a static analysis tool like valgrind to point out possible uninitialized data and pointer mishandling.

Race conditions may be tracked down by adding log output with time stamps. You first have to narrow down where in your "large solution" the problem occurs by observing what happened just prior to the crash. Be sure to use a deferred logging mechanism -- one that does the string processing later or in another thread so it doesn't itself impact the timing too much.

Amardeep 2010-07-15 11:07:26

+1 A:

Did you know you can still debug release builds? Just hit F5 (rather than CTRL+F5) to run in debug.

Is it repeatable i.e. are you doing something specific when it goes bang?

If so, set a breakpoint in your code before the crash and hit F5 to run in debug (make sure you're using your release build though). Then step through until your app crashes. I generally find this faster than adding logging and debug print statements.

If not, just running in debug mode will sometimes catch the error and halt on the offending line.

Failing that, Richard and Amar's answers are good :-)

Jon Cage 2010-07-15 11:14:45

+5 A:

If the code crashes after optimisation is applied (as in the default release), it is most likely that your code is in some way flawed and relies on undefined behaviour which changes between the release and debug build.

Try switching off optimisation in the release build to see if the problem goes away (or switch it on in the debug build to see if it occurs). If it does, you should still aim to find and fix the bug, but you will at least know to be looking for undefined behaviour.

Set the compiler warning level to maximum (/W4) and warnings as errors (/Wx) and fix all warnings (and not simply by casting everything in sight - think about it!). When optimisation is applied, you may well get warnings that did not occur in the debug build because of the more extensive code analysis that is performed - this is useful static analysis.

You can if you wish switch debugging on in an optimised build, but it is unlikely that you will be able to follow what is going on since the optimiser may re-order code, and remove code and variables.

Clifford 2010-07-15 11:20:44

+3 A:

Sounds to me like that stack frame was blown. Trivial to do with a buffer overflow, just copy a large string in a small char[] for example. That wipes out the return address. The code just keeps running until the return, then bombs when it pops a bad address off the stack. Or worse, if the address happens to be valid.

The debugger cannot display anything meaningful since it cannot walk the stack to show you how the code got to the crash location. The actual crash location doesn't tell you anything.

Tuff as nails to debug. You have to get it reproducible and you need either stepping or tracing to find the last known-good function. The one that produces the crash after stepping out of it is the one with the bug. You can actually see the statement that does the damage, the debugger call stack suddenly goes catatonic. If you can't get a consistent repro then a thorough code review is all that's left. You can justify the time by calling it a "security review". Good luck with it.

Hans Passant 2010-07-15 12:06:49

An uninitalized variable (pointer perhaps) could also be causing the problem. Perhaps you should run a static analysis program over your code - CppCheck isn't bad.

Rob 2010-07-15 13:00:27

ansaurus

tags:

views:

answers:

How can I find a rare bug that seems to only occur in release builds?

related questions