views:

107

answers:

6

I have a fairly complex (approx 200,000 lines of C++ code) application that has decided to crash, although it crashes a little differently on a couple of different systems. The trick is that it doesn't crash or trap out in debugger. It only crashes when the application .EXE is run independently (either the debug EXE or the release EXE - both behave the same way). When it crashes in the debug EXE, and I get it to start debugging, the call stack is buried down into the windows/MFC part of things, and isn't reflecting any of my code. Perhaps I'm seeing a stack corruption of some sort, but I'm just not sure at the moment. My question is more general - it's about tools and techniques.

I'm an old programmer (C and assembly language days), and a relative newcomer (couple/few years) to C++ and Visual Studio (2003 for this projecT).

Are there tricks or techniques anyone's had success with in tracking down crashing issues when you cannot make the software crash in a debugger session? Stuff like permission issues, for example?

The only thing I've thought of is to start plugging in debug/status messages to a logfile, but that's a long, hard way to go. Been there, done that. Any better suggestions? Am I missing some tools that would help? Is VS 2008 better for this kind of thing?

Thanks for any guidance. Some very smart people here (you know who you are!).

cheers.

A: 

Might be that you have a too big object on the stack...

Explainations (from comments):

I gives this answer because that's the only case I've seen that a debuger (VS or CodeWarrior) couldn't catch and seeemed mysterious. Most of the time, that was the big application object that was defined on the stack in the main() function, and having members not allocated on the heap. Just calling new to instantiate the object fixed the obscure problem. Didn't need to get a specific tool for that in the end.

Klaim
Yeah, I'd agree. I've got a lot of big objects on the stack, and I assume one or more of them have run away.
I gives this answer because that's the only case I've seen that a debuger (VS or CodeWarrior) couldn't catch and seeemed mysterious. Most of the time, that was the big application object that was defined on the stack in the main() function, and having members not allocated on the heap. Just calling new to instantiate the object fixed the obscure problem. Didn't need to get a specific tool for that in the end.
Klaim
Good lead. I'll look for these in my app. You may have hit my specific issue on the head! Thanks.
Was it your problem in the end?
Klaim
+1  A: 

lint.

http://stackoverflow.com/questions/632057/c-c-free-alternative-to-lint

S.Lott
Good lead. I'll start digging through them. Thank you!
+1  A: 

I've not done C++ professionally for over 10 years, but back in the day I used Rational PurifyPlus, which will be a good start, as is BoundsChecker (if it still exists!) These products find out of bounds accesses, corrupted memory, corrupted stack and other problems that can go undetected until "boom" and then you have no idea where you are.

I would try these first. If that fails, then you can start typing in logging statements.

If the debugger mitigates the crash, this can be for these reasons:

  • memory corruption: under a debug build memory is allocated with space before an after, so rogue writes may not corrupt under a debug session
  • timing and multi-threading: the debugger alters timing of threads and can make tricky multi-threaded problems hard to nail down.
mdma
I'll look around and see if these guys still exist. Thanks for the suggestions.
Boundschecker still exists: Compuware was bought by Microfocus. Its checking is more invasive than Rational's purify but can prove to be more useful for special cases (beware of the different timing behavior)
jdehaan
Yeah, I started looking at that...looks like it's a bit over a grand - more $$ than I was planning...may resort to it, though.Thanks!
@mdma: thanks for the suggestions on the causative factors. Hopefully this isn't a threading problem - although it's an MFC app (so, inherently multithreaded), there is no special multithreading that is specific to this application.The explanation for debug vs. non debug memory model makes sense. Hadn't thought of that!
A: 

If it's memory corruption, a memory tracking/diagnostic tool (I used to use BoundsChecker to great effect in the good old days of C++) may help you to locate and fix the cause in minutes, where any other technique coud take days or even months.

For other cases, you've suggested another approach yourself: a sometimes labour-intensive but very effective approach to getting a "real" stack trace is to simply use printf - a vastly underrated debugging tool available in every environment. If you have a rough idea you can straddle the crash area with only a few log messages to narrow down the location, and then add more as you home in on the problem area. This can often unearth enough clues that you can isolate the cause of the crash in a few minutes, even though it can seem like a lot of work and perhaps a hopeless cause before you start.

edit:

Also, if you have the application under source control, then get a historical version from when you think it was working, and then do a binary chop between that date and "now" to isolate when the issue began to occur. This can often narrow down a bug to the precise checkin that introduced the bug, and if you're lucky it will point you at a few lines of code. (If you're unlucky the bug won't be so easily repeatable, or you'll narrow it down to a 500-file checkin where a major refactoring or similar took place)

Jason Williams
Thanks. Yeah, I've used this technique in the past, when IDEs and debuggers REALLY sucked. And, having familiarity with the application would allow me to break it down into likely regions, and home in on the issue, as you suggest.I think I will look into the C++/lint variants, though, as well as some of these other checking programs - the more crap they can catch, the better for me, as long as they don't send me down blind alleys.
Yeah, definitely code analysis/lint tools are worth trying as if they were not used during development of 200k lines of code there is a good chance that they will pick up some (potential) bugs that are worth fixing. Oh, just had another thought - added to the main answer body.
Jason Williams
A: 

I couldn't recommend more the blog of Mark Rusinovich. Absolutely brilliant guy from whom you can learn a whole bunch of debugging techniques for windows and many more. Especially try read some of the "The Case of" series! Amazing stuff!

For example take a look at this case he had investigated - a crash of IE. He shows how to capture the stack of the failing thread and many more interesting stuff. His main tools are windows debugging tools and also his sysinternals tools!

Enough said. Go read it!

Also I would recommend the book: Windows Internals 5. Again by Mark and company.

Ivan Zlatanov
I love Russinovich's sysinternals tools!Hadn't looked at his blog too closely, though. Sounds like that was a major oversight. Thank you.
A: 

Get the debugging tool kit from MS ( http://www.microsoft.com/whdc/devtools/debugging/default.mspx ).

Set adplus up for crash mode monitoring ( http://www.microsoft.com/whdc/devtools/debugging/default.mspx ).

This should get you a crash dump when the app crashes. Load the dump up in WindDbg from the debugging toolkit and analyze using that. It is a painful, but very powerful, process to anaylyze out-of-debugger crashes.

There are quite a few resources around for using WinDbg - a good book on general Windows unmanaged debugging and the tools in the debugging kits is: http://www.amazon.com/Advanced-Windows-Debugging-ebook/dp/B000XPNUMW

S.Skov
Excellent!Thank you.