views:

433

answers:

4

Here's the situation:

Background

I have a mixed mode .NET/Native application developed in Visual Studio 2008.

What I mean by mixed mode is that the front end is written in C++ .NET which calls into a native C++ library. The native code does the bulk of the work in the app, including kicking off new threads as it requires. The .NET code is just for UI purposes (win forms).

I have a release build of application running on a tester's computer.

The native libraries were compiled with full optimisations but also with debugging enabled (the "Debug Information Format" was set to "Program Database").

What this means is that I have the debugging symbols for the application in a PDB file.

The problem

So anyway, one of the testers is having a problem with the app where it occasionally crashes on XP. I've been able to get the minidump of the crash using Dr Watson for several runs.

When I debug into it (using the minidump - I'm not actually debugging the real app), all the debugging symbols are loaded correctly: I can see the full stack trace of all of the native threads correctly. Other threads (which are presumably the .NET threads) don't have a stack trace, but they all at least show me which dll the thread was started on (i.e. ntdll.dll).

It correctly reports the thread which fails ("Unhandled exception at 0x0563d652 in user(5).dmp: 0xC0000005: Access violation reading location 0x00000000).

However when I go into the thread it shows nothing useful. In the stack trace there is a single entry which just has the memory address "0563d652()" (not even "ntldll.dll").

When I go into dissasembly it just shows a random section of about 30 instructions. Either side of the memory address is just "???". It almost looks like it is not part of my source code (isn't your binary loaded sequentially into memory? is it normal to have a random set of assembly statements in the middle of nowhere?).

My questions

So basically my questions are threfold.

1) Can anyone explain the debugger's lack of information?

2) Bearing in mind, I can't show the error occurred in my code, can anyone suggest a reason for the failure

3) Can I do anything else to help me diagnose this current problem in the future?

Help!

John

Update:

Here is the stack dump for the failing thread from WinDBG

 # ChildEBP RetAddr  
WARNING: Frame IP not in any known module. Following frames may be wrong.
00 099bf414 02d0e7fc 0x563d652
01 00000000 00000000 0x2d0e7fc

Weird huh? Doesn't even show a DLL.

Is it possible that I've corrupted the stack/heap somehow which has caused a thread to just get corrupted...?

+3  A: 

Are you using WinDbg? If so, are you using the Son of strike extension?

Bugslayer: Son-of-Strike

-or-

Drill Into .NET Framework Internals to See How the CLR Creates Runtime Objects?

arul
No, using the Visual Studio 2008 integrated debugger. I will give WinDbg a go though - good suggestion, thanks.
John
Same result with WinDBG I'm afraid. As for SOS, it recommends that you use a full dump instead of a minidump (which is all I've got). Will see if I can get a full dump to try.
John
well, minidump could also be a full dump in a sense - depends on which options it was created with.besides, using the windbg is always a good idea ;o)
John, don't bother with dear dr. wtsn's minidumps. If you have Windbg (which is the right way to go, IMHO), attach adplus to your app, and have it create a full dump upon crash. The QA is probably near, why settle for minidumps?
eran
+1  A: 

We had an issue similar to this where a code bug was silent in MSVC2K5 SP1, but if you had the MSVC2K5 SP2 runtime installed it caused an error which didn't point at valid code.

Part of the problem is, when you start executing data as code you could be doing anything and so the crash location becomes useless as you cannot even get back to a valid stack trace.

We had this happen to us when the new .Net runtime install installed a newer version of the MSVC C++ Runtime in the SxS directory.

In the end our method to resolve the issue was to make the crash happen frequently and add as much logging as necessary to localize it.

Greg Domjan
Good suggestion - I suspect that I have VC2008 SP1 redist whereas the test machine has VC2008 (not SP).Might upgrade the machine and see if that helps.
John
+1  A: 

could you post the stack of the faulting thread once you've grabbed and installed a copy of windbg and opened the dump file there? we could start from there.

A very kind offer thankyou - please see the updated question.
John
well, not much from the stack. have you fixed up the symbols (both your private and public system)? what is the state of registers ('r' in windbg)? does eip point to valid code? ('u eip' in windbg)?
A: 

Your EIP was just corrupted.
Assuming the ESP is valid, you can view the callstack, just type:
dds esp [enter]
dds [enter]

You can also use the memory windows:
Set address to: esp
Set format to: Pointer&Symbol

Tal