views:

88

answers:

3

I have a program written in C#, running on a Windows CE device (on Compact Framework). It processes minimal user actions (button clicks), uses serial port and TCP/IP communication.

The problem is: sometimes the software shuts down on it's own. Well, at least the main form can't be seen any more. In the background the application (or parts of the application) seems still running (at least in one documented case it was) because it uses the serial port, so restarting the application doesn't help. I can't reproduce the problem since it happens in most of the cases when there is no user interaction, no serial port communication and the network communication is all "i am still alive" messages, the software just crashes seemingly without reason. (I try to make it happen in debug mode to know at least where is the problem in the code if it is a software bug, but I have no luck so far.)

Since I'm running out of ideas, the question is: what bug or exception or OS action or hardware malfunction can use such a behavior? The problem has been seen on different devices of the same type, so it shouldn't be a hardware error. (Or all my hardwares has the same error.) Exceptions are handled, so it shouldn't be an exception. Unhandled exceptions are handled too, so it shouldn't be an unhandled exception either. (My guess is that it is caused by a StackoverflowException because I don't know any other exceptions that can't be catched, but there isn't recursion in the code, at least not willingly, so it shouldn't be a possibility either.)

+1  A: 

Quite some exceptions cannot be caught. These are called first chance exceptions. And some exceptions can be caught and logged, but cannot be recovered from (memory exceptions). However, it is possible to debug them. Here's a post at CodeProject that explains this principle.

What you should do is select some candidate exceptions in Visual Studio that you suspect might occur, and attach the debugger to the running instance.

Having unmanaged resources as you have (serial port) means you can have unmanaged leaks (not using IDisposable + using properly) and unmanaged exceptions. These exceptions can only be caught with an empty catch (i.e., without specification of even Exception, which is not the parent of unmanaged exceptions) in a try/catch block.

PS: some undefined behavior can occur when exceptions are raised in finally blocks or in finalizers/destructors. Also, not many exceptions propagate across thread boundaries and terminate all threads.

Edit

To make things a little clearer, there are a few exceptions that the CLR (and its specification) define as non-catchable. Basically, these are all exceptions that cross thread boundaries. These asynchronous exceptions, when occurring within a lock, will result in state corruption. Best known are OutOfMemoryException, ThreadAbortException and StackOverflowException. When the OutOfMemoryException or StackOverflowException occurs in synchronous code, it's unlikely that you can correct state and the CLR will terminate your application.

In addition there's the ExecutionEngineException and BadImageFormatException which should not happen in verifiable code and should not be caught. Exceptions such as the TypeLoadException and MissingMemberException can sometimes be caught and sometimes not (if a linked assembly is missing, it'll be hard to catch these, and you shouldn't, but if you use reflection, you should catch these).

In short: exceptions should be caught in the thread they happen in. You will not be able to catch exceptions if they happen in another thread, because they are not propagated (with the exception of the ThreadAbortException). Your application stays alive after an exception (at least, you think), so it is logical to assume that the exception doesn't happen in the thread where you're trying to catch it. Using the Debug > Exceptions window, you can select any exception and break on the code when they happen.

Note on Exception

An added note on managed and unmanaged exceptions. You cannot catch an unmanaged exception using catch (Exception e), because the unmanaged exception does not inherit from Exception. Instead, use an empty catch, which will trap any unmanaged exception for you. Wrap this around your application and thread entrypoint methods and you should be able to catch the majority of catchable exceptions.

Abel
As far as I can google first chance exceptions occur only in debug mode, don't they?
ytg
@ytg: No, they don't. But more importantly, you should find out *what* exception is thrown. Have you already attached the debugger to a running instance?
Abel
Yes, I've been trying to catch every exception possible with VS for two-three days. But I have no luck with reproducing the problem yet.
ytg
+1  A: 

If you start secondary threads using the Thread class, and don't specify that they are background threads, they will keep your process running until they exit, even if the main thread has completed (ie. main form closed, and Main method has returned.)

If you had had a StackOverflowException, your process would've been killed by Windows outright, so that isn't it.

Lasse V. Karlsen
I checked and all of my non-background threads are aborted on application exit, so the main thread has to crash before it can abort the threads. The question remains: what could possibly cause only the main thread crashing?
ytg
+1  A: 

You likely have a native exception or an access violation (which will typically manifest as a first-chance exception). No amount of managed exception handling can trap one of these - they key is to not cause the exception in the first place.

Are you p/invoking or making unsafe calls? If you' calling an API and causing something like a buffer overrun or stack corruption you would see this kind of behavior (though often you'll get an OS dialog complaining.

Tracking these down is often tough. This is a CE device - does it have a debug port (it's typically a serial port)? It's very likely that an exception will dump a message there, so if you have access to one, that is a good starting point. If you have KITL, running the OS in debug while the app runs may also trap whatever is happening.

ctacke