views:

547

answers:

5

I have a managed code Windows Service application that is crashing occasionally in production due to a managed StackOverFlowException. I know this because I've run adplus in crash mode and analyzed the crash dump post mortem using SoS. I have even attached the windbg debugger and set it to "go unhandled exception".

My problem is, I can't see any of the managed stacks or switch to any of the threads. They're all being torn down by the time the debugger breaks.

I'm not a Windbg expert, and, short of installing Visual Studio on the live system or using remote debugging and debugging using that tool, does anyone have any suggestions as to how I can get a stack trace out of the offending thread?

Here's what I'm doing.

!threads

...

XXXX 11 27c 000000001b2175f0 b220 Disabled 00000000072c9058:00000000072cad80 0000000019bdd3f0 0 Ukn System.StackOverflowException (0000000000c010d0)

...

And at this point you see the XXXX ID indicating the thread is quite dead.

A: 

Is it an option to wrap your code with a try-catch that writes to the EventLog (or file, or whatever) and run this debug one-off?

try { ... } catch(SOE) { EventLog.Write(...); throw; }

You won't be able to debug, but you would get the stack trace.

Erich Mirabal
A: 

One option you have is to use a try/catch block at a high level, and then print or log the stack trace provided by the exception. Every exception has a StackTrace property that can tell you where it was thrown from. This won't let you do any interactive debugging, but it should give you a place to start.

Eric Burnett
I just had this weird feeling of deja vu... :)
Erich Mirabal
Heh, I just re-read your answer and I see your point :P. Oh well, probably worth being explicit that exceptions have the stack they were thrown from, in case it isn't obvious
Eric Burnett
+7  A: 

Once you've hit a stack overflow, you're pretty much out of luck for debugging the problem - blowing your stack space leaves your program in a non-deterministic state, so you can't rely on any of the information in it at that point - any stack trace you try to get may be corrupted and can easily point you in the wrong direction. Ie, once the StackOverflowException occurs, it's too late.

Also, according to the documentation you can't catch a StackOverflowException from .Net 2.0 onwards, so the other suggestions to surround your code with a try/catch for that probably won't work. This makes perfect sense, given the side effects of a stack overflow (I'm surprised .Net ever allowed you to catch it).

Your only real option is to engage in the tedium of analyzing the code, looking for anything that could potentially cause a stack overflow, and putting in some sort of markers so you can get an idea where they occur before they occur. Eg, obviously any recursive methods are the first place to start, so give them a depth counter and throw your own exception if they get to some "unreasonable" value that you define, that way you can actually get a valid stack trace.

Not Sure
That's interesting. I hadn't even noticed that had changed. The last time I had one of these was when I mistyped the property/member getter and got the endless recursive calls (and I was able to catch and debug it back then). +1 for actually reading the latest documentation.
Erich Mirabal
A: 

For what its worth, starting in .NET 4.0, Visual Studio (and any debuggers that rely on the ICorDebug api) gain the ability to debug minidumps. This means you will be able to load the crash dump into the VS debugger on a different computer and see the managed stacks similar to if you had attached a debugger at the time of the crash. See the PDC talk or Rick Byers' blog for more information. Unfortunately this won't help you with the problem at hand, but perhaps it will next time you run into this issue.

Eric Burnett
A: 

Take a look at your ADPLUS Crash Mode Debug Log. See if there are any access violations or true native Stack Overflow Exceptions happening before the managed StackOverflowException is thrown.

My guess is that there is an exception on the thread's stack that you cold catch before the thread exits.

You could also use DebugDiag from www.iis.net and then set a Crash rule and create a full dump file for Access Violations (sxe av) and Stack Overflow native exceptions (sxe sov)

Thanks, Aaron

AaronBa