views:

3108

answers:

5

What's the best way to detect an application crash in XP (produces the same pair of 'error' windows each time - each with same window title) and then restart it?

I'm especially interested to hear of solutions that use minimal system resources as the system in question is quite old.

I had thought of using a scripting language like AutoIt (http://www.autoitscript.com/autoit3/), and perhaps triggering a 'detector' script every few minutes?

Would this be better done in Python, Perl, PowerShell or something else entirely?

Any ideas, tips, or thoughts much appreciated.

EDIT: It doesn't actually crash (i.e. exit/terminate - thanks @tialaramex). It displays a dialog waiting for user input, followed by another dialog waiting for further user input, then it actually exits. It's these dialogs that I'd like to detect and deal with.

+2  A: 

How about creating a wrapper application that launches the faulty app as a child and waits for it? If the exit code of the child indicates an error, then restart it, else exit.

Vinko Vrsalovic
+4  A: 

Best way is two use a named mutex.

  1. Start your application.
  2. Create a new named mutex and take ownership over it
  3. Start a new process (process not thread) or a new application, what you preffer.
  4. From that process / application try to aquire the mutex. The process will block
  5. When application finish release the mutex (signal it)
  6. The "control" process will only aquire the mutex if either the application finishes or the application crashes.
  7. Test the resulting state after aquiring the mutex. If the application had crashed it will be WAIT_ABANDONED

Explanation: When a thread finishes without releasing the mutex any other process waiting for it can aquire it but it will obtain a WAIT_ABANDONED as return value, meaning the mutex is abandoned and therfore the state of the section it was protected can be unsafe.

This way your second app won't consume any CPU cycles as it will keep waiting for the mutex (and that's enterely handled by the operating system)

Jorge Córdoba
+1  A: 

I realize that you're dealing with Windows XP, but for people in a similar situation under Vista, there are new crash recovery APIs available. Here's a good introduction to what they can do.

Eclipse
+1  A: 

I think the main problem is that Dr. Watson displays a dialog and keeps your process alive.

You can write your own debugger using the Windows API and run the crashing application from there. This will prevent other debuggers from catching the crash of your application and you could also catch the Exception event.

Since I have not found any sample code, I have written this Python quick-and-dirty sample. I am not sure how robust it is especially the declaration of DEBUG_EVENT could be improved.

from ctypes import windll, c_int, Structure
import subprocess

WaitForDebugEvent = windll.kernel32.WaitForDebugEvent    
ContinueDebugEvent = windll.kernel32.ContinueDebugEvent
DBG_CONTINUE = 0x00010002L    
DBG_EXCEPTION_NOT_HANDLED = 0x80010001L

event_names = {    
    3: 'CREATE_PROCESS_DEBUG_EVENT',
    2: 'CREATE_THREAD_DEBUG_EVENT',
    1: 'EXCEPTION_DEBUG_EVENT',
    5: 'EXIT_PROCESS_DEBUG_EVENT',
    4: 'EXIT_THREAD_DEBUG_EVENT',
    6: 'LOAD_DLL_DEBUG_EVENT',
    8: 'OUTPUT_DEBUG_STRING_EVENT', 
    9: 'RIP_EVENT',
    7: 'UNLOAD_DLL_DEBUG_EVENT',
}
class DEBUG_EVENT(Structure):
    _fields_ = [
        ('dwDebugEventCode', c_int),
        ('dwProcessId', c_int),
        ('dwThreadId', c_int),
        ('u', c_int*20)]

def run_with_debugger(args):
    proc = subprocess.Popen(args, creationflags=1)
    event = DEBUG_EVENT()

    while True:
        if WaitForDebugEvent(pointer(event), 10):
            print event_names.get(event.dwDebugEventCode, 
                    'Unknown Event %s' % event.dwDebugEventCode)
            ContinueDebugEvent(event.dwProcessId, event.dwThreadId, DBG_CONTINUE)
        retcode = proc.poll()
        if retcode is not None:
            return retcode

run_with_debugger(['python', 'crash.py'])
Leonhard
+1  A: 

Here is a slightly improved version.

In my test the previous code run in an infinite loop when the faulty exe generated an "access violation".

I'm not totally satisfied by my solution because I have no clear criteria to know which exception should be continued and which one couldn't be (The ExceptionFlags is of no help).

But it works on the example I run.

Hope it helps, Vivian De Smedt

from ctypes import windll, c_uint, c_void_p, Structure, Union, pointer
import subprocess

WaitForDebugEvent = windll.kernel32.WaitForDebugEvent
ContinueDebugEvent = windll.kernel32.ContinueDebugEvent
DBG_CONTINUE = 0x00010002L
DBG_EXCEPTION_NOT_HANDLED = 0x80010001L

event_names = {
    1: 'EXCEPTION_DEBUG_EVENT',
    2: 'CREATE_THREAD_DEBUG_EVENT',
    3: 'CREATE_PROCESS_DEBUG_EVENT',
    4: 'EXIT_THREAD_DEBUG_EVENT',
    5: 'EXIT_PROCESS_DEBUG_EVENT',
    6: 'LOAD_DLL_DEBUG_EVENT',
    7: 'UNLOAD_DLL_DEBUG_EVENT',
    8: 'OUTPUT_DEBUG_STRING_EVENT',
    9: 'RIP_EVENT',
}

EXCEPTION_MAXIMUM_PARAMETERS = 15

EXCEPTION_DATATYPE_MISALIGNMENT  = 0x80000002
EXCEPTION_ACCESS_VIOLATION       = 0xC0000005
EXCEPTION_ILLEGAL_INSTRUCTION    = 0xC000001D
EXCEPTION_ARRAY_BOUNDS_EXCEEDED  = 0xC000008C
EXCEPTION_INT_DIVIDE_BY_ZERO     = 0xC0000094
EXCEPTION_INT_OVERFLOW           = 0xC0000095
EXCEPTION_STACK_OVERFLOW         = 0xC00000FD


class EXCEPTION_DEBUG_INFO(Structure):
    _fields_ = [
        ("ExceptionCode", c_uint),
        ("ExceptionFlags", c_uint),
        ("ExceptionRecord", c_void_p),
        ("ExceptionAddress", c_void_p),
        ("NumberParameters", c_uint),
        ("ExceptionInformation", c_void_p * EXCEPTION_MAXIMUM_PARAMETERS),
    ]

class EXCEPTION_DEBUG_INFO(Structure):
    _fields_ = [
        ('ExceptionRecord', EXCEPTION_DEBUG_INFO),
        ('dwFirstChance', c_uint),
    ]

class DEBUG_EVENT_INFO(Union):
    _fields_ = [
        ("Exception", EXCEPTION_DEBUG_INFO),
    ]

class DEBUG_EVENT(Structure):
    _fields_ = [
        ('dwDebugEventCode', c_uint),
        ('dwProcessId', c_uint),
        ('dwThreadId', c_uint),
        ('u', DEBUG_EVENT_INFO)
    ]

def run_with_debugger(args):
    proc = subprocess.Popen(args, creationflags=1)
    event = DEBUG_EVENT()

    num_exception = 0

    while True:
        if WaitForDebugEvent(pointer(event), 10):
            print event_names.get(event.dwDebugEventCode, 'Unknown Event %s' % event.dwDebugEventCode)

            if event.dwDebugEventCode == 1:
                num_exception += 1

                exception_code = event.u.Exception.ExceptionRecord.ExceptionCode

                if exception_code == 0x80000003L:
                    print "Unknow exception:", hex(exception_code)

                else:
                    if exception_code == EXCEPTION_ACCESS_VIOLATION:
                        print "EXCEPTION_ACCESS_VIOLATION"

                    elif exception_code == EXCEPTION_INT_DIVIDE_BY_ZERO:
                        print "EXCEPTION_INT_DIVIDE_BY_ZERO"

                    elif exception_code == EXCEPTION_STACK_OVERFLOW:
                        print "EXCEPTION_STACK_OVERFLOW"

                    else:
                        print "Other exception:", hex(exception_code)

                    break

            ContinueDebugEvent(event.dwProcessId, event.dwThreadId, DBG_CONTINUE)

        retcode = proc.poll()
        if retcode is not None:
            return retcode

run_with_debugger(['crash.exe'])