views:

286

answers:

11

Hi,

I have a bug somewhere that is causing my app to just vanish without an error message or something like that. The app just dissapears from the screen and it's no longer listed on the Task Manager.

The app is a C++Builder app (CBuilder2007), and I have tried everything I have think of to try to catch this error. It happens very very seldom, it has never crashed on my machine and just once in the test machines we have in the office. With one of our customers it happens a little bit more frequent, but we haven't find a way to make it happen, or to find the circumstances where it happens. It is a heavy multithreaded app.

I have madExcept enabled in this app, but it doesn't catch anything. I have already added handlers using the set_terminate and set_unexpected RTL routines, without any luck.

The only info I have is from a "loader app" wrapper I did, to get the return code from the main app. It exits with the C0000005 code, which I believe means an Access Violation happened. The strange thing is that, as mentioned, there is not even the Windows error box or something like that.

The question would be: any ideas to try to catch this? As I don't even have a clue where this might be happening (I have a lot of logging around the app, but the "trail" before the app crashes hasn't lead to anywhere) my idea with the set_terminate and set_unexpected routines was to get a stack trace to try to see where the error was generated, but so far those routines aren't being called at all (at least the only time this has happened here in my office)

Thanks in advance


[Update 22.Sept.2009] Using AddVectoredHandlerException I was able to get a callstack from the crash, and now I can start trying to isolate and fix the bug. Thanks!!!

+5  A: 

terminate/unexpected gets called only by C++ runtime, and only for C++ exceptions.

Access violation is a SEH exception - to catch that, you need SetUnhandledExceptionFilter, or AddVectoredExceptionHandler (if it's >=XP). You could then create a minidump, using MiniDumpWriteDump and related.

PiotrLegnica
Thanks. I have added some handlers using AddVectoredExceptionHandler, hopefully I'll be able to get this error when it happens. The crash reporting tool I'm using (madExcept) uses SetUnhandledExceptionFilter, so maybe somewhere it's being trapped before and that's why I'm not able to get anything from it.
Rodrigo Gómez
Adding the VectoredExceptionHandler I was able to catch the exception that it's happening, and now I have a callstack that I can work with. Thanks for the suggestion!
Rodrigo Gómez
+1  A: 

I've seen this happen to C++ code twice before:

  1. When dynamically loading a Windows API using LoadLibrary and GetProcAddress and then calling it through a function pointer declared with the wrong calling convention (it should have had __stdcall but didn't).

  2. Where a class had a function pointer as a member variable, and the function pointer was called before having been initialised.

RichieHindle
I'm not using directly and Windows API with LoadLibrary, but I'll try to see if some of the third party stuff I have uses it.The function pointers... I do have some of them around my code. I will have to re-check them to see if everything is ok there.
Rodrigo Gómez
A: 

There are two more things that I have seen do this in the past that you might consider: stack overflow (infinite recursion, bad parameters causing big temporary variables to be located on the stack etc), or an unhandled exception in a secondary thread.

IanH
At first I thought about the stack overflow option, but when I got the return code from this times where it dissapears, and it returns the C0000005 code, I moved onto the AV idea. It could still be a possibility, certainly the only times I saw this before was because a stack overflow, but I don't have any recursion in my own code, and there are no big variables being passed around in this app.
Rodrigo Gómez
I really don't like the idea of an AV that can cause this behaviour that is not picked up by MadExcept. Are you fully up to date? If so, you should ask Matthias about the specific code you have found - he may have some useful ideas. Good luck.
IanH
A: 

Configure your app to write a minidump in case of a crash.

I am not sure how this is in CBuilder, but in visual studio you can load this dumps directly and it shows you a complete callstack and the source code line that caused the crash.

I used this a lot to find the cause for crashes that happened on customer machines.
However especially for multi-threaded applications it is likely that the real error (e.g. memory was released to early) happened a while before the actual crash, so it may still be very difficult to find the root cause.

Wolfgang
His app is disappearing, not crashing.
RichieHindle
The "access violation" he mentions qualifies as crash for me. :-) Maybe there is some kind of handler in place which catches this and just exits. In that case it would be actually more helpful if the program just crashes and could produce a dump file.
Wolfgang
What did you think happened to the app when it "disappeared"? It crashed - it just didn't report anything on the screen. Tada. Or do you have another explanation for an application suddenly "disappearing"? Aliens? Weekend? Dating with another, cute program maybe?
karx11erx
There is no support, at least that I remember, for an automated crash report like the one on VStudio. I have already some tools for this, but they don't catch it. I'm going to try with the AddVectoredExceptionHandler to see if it can catch the problem, and then create a dump from there.
Rodrigo Gómez
+1  A: 

You have to run your application in debug mode, and do a stress test by running set of complicated scenario more and more, so you can catch the exception in the debug mode.

Also try to review the code again that contains accessing the shared memory between your threads, may be the problem from mulithreading, you can try putting locks on every shared memory access to getting sure mulithreading is the reason(but this will decrease performance)

Ahmed Said
You dont really have to stress the app -- just run it in debug mode, the debugger should catch the end of execution even if its an " exit()" and with luck you will have the full stack trace to work out what want wrong, or, at least an address you set an interupt for so you catch ot next time.
James Anderson
+1  A: 

Re: the app disappearing, do the customer machines have the "Report errors" setting turned off in Windows? It's buried in the "System" control panel, and when it's turned off the normal Windows crash notification dialog is supressed.

Andrew Medico
I'm not sure about the customers' machines, but here in the machine where this happened that setting is not turned off.
Rodrigo Gómez
A: 

Maybe adding a good, old fashioned signal handler might at least give some indication of what happened?

karx11erx
+3  A: 

I've come across issues like this a couple times, where the application seems to simply stop. No exception handlers or crash handlers or such are invoked. The app simply seems to terminate instantly.

Unfortunately, I can't offer any easy advice on how to figure it out. The other responses here have some good ideas. If you don't already having something to catch unhandled exceptions as per PiotrLegnica's reponse, then you should do so.

However, if the program is truly terminating instantly like the times I've seen this, then even a handler registered with SetUnhandledExceptionFilter won't help. The program is stopping all execution and dropping out of memory before the handler is ever invoked.

A few ideas so come to mind though:

  • Check your codebase for any usage of TerminateProcess or TerminateThread. I could be wrong but I believe usage of these might be able to cause the symptoms you're seeing.
  • Check any usage of function pointers, including callbacks and WindowProcs passed to Window's APIs. Make sure that the calling conventions, parameter lists, and return values all match correctly. If a function pointer is being casted to make the code compile, it may be hiding a mismatch that could be causing bad things to happen.
  • Consider any 3rd-party libraries or components (ActiveX or such) that you're using. Maybe they have a bug in their own code causing this problem in obscure situations. You could try placing logging statements before and after calls to their functions to see if that can pin down where the program stops.
  • And if nothing else helps, put more logging throughout your own code.

And on the subject of logging: When I had to help track down a problem like this where at my job, we ended up making a logging mechanism that would create a uniquely named log each time the program started and would delete it if the program ended normally. That way, another log file would be left in existence each time the termination problem occurred. We used a date-time stamp as part of the unique naming aspect. The content of the logs was simply a record of which actions were happening in the program. We went through several iterations of examining logs and then adding more logging statements until this finally led us to source. And while tracking down the problem, this mechanism gave us a very clear idea on just how frequently the problem was occurring. You might consider something similar.

TheUndeadFish
We already have a log that stores several runs of my app. Actually it can be configured for a maximum file size, the default is 100mb IIRC. We write a lot of things there (using a "debug level" configured in the app itself, that can be changed in runtime), but unfortunately I have never seen some kind of pattern when it just dissapear to be able to trace this to the source.
Rodrigo Gómez
+1 for the self-deleting log - great idea!
RichieHindle
A: 

Subscribe to Windows Error Reporting. Chances are that some of your customers will report the AV to Microsoft, who'll happily share the collected stack traces with you. As a benefit, you get hard figures on the reliability of your application. Management loves those. E.g. you can set a goal of "reducing the frequency of errors by 50% by 2010".

MSalters
A: 

Put your app running, then attach the windbg (crash-mode) that the first occurrence of second-chance exception is generated dump. Remember to put the symbol files (PDB).

lsalamon
A: 

You've got a tough situation if you can't reproduce it locally. Capturing a crash dump, or catching the program in the act with a debugger is certainly your best option, as others have suggested.

If this were my problem, I'd try monitoring with Process Monitor from sysinternals. Set it up to watch only your process, and make sure it's backed by a file if it will take a long time. This might tell you which thread is active and what is happening when the process ends. You might also try to find the equivalent of 'truss' for Windows - a program to monitor system calls.

Darin