views:

450

answers:

2

I have a large Compact Frameworks V2.0 application that in most cases works very well. On certain devices about once a day, a user receives a Native Error 0xC0000005 that is not caught with the standard managed Try/Catch block.

My application synchronizes with the server via ASMX calls at fixed intervals. The problem appears to occur during synchronization. There is considerable business logic in addition to the ASMX call that happens at the time of the synchronization, but 98% of that is managed code. I've reviewed all my P/Invokes and the applications native C++ libraries and at this point I'm about 95% certain that isn't where the problem is.

Since this only happens on certain devices and very infrequently (less than once a day) it's very difficult to isolate. I've instruemented my code and it appears as if it happens in random places within the application, so I suspect something is corrupting memory.

Any thoughts on how to troubleshoot this further would be appreciated.

+4  A: 

A 0xC0000005 is an access violation, so something is trying to read from or write to an address that it doesn't have rights to access. These tend to be really hard to find and experience is one of the best tools (well Platform Builder's debugger is really helpful too, but that's a whole separate avenue of debugging and requires experience that you probably don't have or you'd have already tried it). I find that logging tends to be less useful that subtractive coding - removing P/invoke calls with mock managed calls whenever possible.

Access violations in managed apps typically happen for one of these reasons:

  • You P/Invoke a native API passing in a handle to a managed object and the native API uses that handle. If you get a collection and compaction while the native API is running, the managed object may move and the pointer becomes invalid.
  • You P/Invoke something with a buffer that is too small or smaller than the size you pass in and the API overruns a read or write
  • A pointer (IntPtr, etc) you pass to a P/Invoke call is invalid (-1 or 0) and the native isn't checking it before use
  • You P/Invoke a native call and the native code runs out of memory (usually virtual) and isn't checking for failed allocations and reads/writes to an invalid address
  • You use a GCHandle that is not initialized or that somehow is pointing to an already finalized and collected object (so it's not pointing to an object, it's pointing to an address where an object used to be)
  • Your app uses a handle to something that got invalidated by a sleep/wake. This is more esoteric but certainly happens. For example, if you're running an application off of a storage card, the entire app isn't loaded into RAM. Pieces in use are demand-paged in for execution. This is all well and good. Now if you power the device off, the drivers all shut down. When you power back up, many devices simply re-mount the storage devices. When your app needs to demand-page in more program, it's no longer where it was and it dies. Similar behavior can happen with databases on mounted stores. If you have an open handle to the database, after a sleep/wake cycle the connection handle may no longer be valid.

You'll note the trend here that almost all of these are P/Invokes and that's no accident. It's quite difficult to get managed code to do this on its own.

ctacke
Thanks for the thorough feedback ctacke - After a thorough code review the only thing that I see falls into one of your categories above is a StringBuilder I'm not pinning before sending it to a P/Invoke. I read somewhere that the marshalling somehow handles that, do you know if that is correct?
Kevin
I'm just about try the subtract code method, but with a minimum of one day between failures, this may be difficult. It seems like it only happens on certain devices, when in sleep mode/not charging when it wakes up to sync. Could this be something in the BSP or compact frameworks native code?
Kevin
In Windows CE you can *never* assume the platform is not to blame. It certainly could be an issue there. AS for the StringBuilder, it depends on usage. If it's a synchronous call, you're safe. If it's asynchronous, not, you're not.
ctacke
Thanks again for the additional details, it's in a synchronous call so I guess I'm safe. The details in crash report have the three values, ExceptionCode which is 0xC00 ExceptionAddr:0x12341234 and Reading:0x0000000 I assume 0x00 for reading would probably be failed allocation of memory
Kevin
And 0x[OTHERADDR] for reading would mean a managed object probably got moved or something got corrupted. Correct? After the fact is there any significance to the ExceptionAddress value with respect to the DLL image? Thanks again for your help, this one has me stumped.
Kevin
Kevin
No, they certainly are invalid addresses. The exception address of 0x12341234 looks like a heap corruption as well.
ctacke
+1  A: 

My native C++ exception handling was not including async exception, and thus was not catching access violation exceptions.

This may/may not be helpful for my problem, but might be helpful for others.

Using the /EHa switch as documented in this link will allow for catching these types of exceptions:

http://msdn.microsoft.com/en-us/library/1deeycx5.aspx

Kevin