views:

49

answers:

2

I wrote a windows service in C++ that needs to restart every night at midnight, so I call exit(1) on it so that it can be restarted by SCM. The problem is it seems every other night it starts up partially and hangs. In the event log, I get this:

Application popup -Application Error: The instruction at "0x0043c145" referenced memory at "0x00000035". The memory could not be "read".

It seems to fail right before opening an ODBC connection to a SQL Server 2008 database. I can confirm the service actually exits before it restarts; nevertheless I get this error every once in a while when it stops and restarts itself, but if I stop and restart the service manually over and over I can never get it to fail, and if I control the process from a terminal port and exit manually from there it never fails either.

If I try to attach a debugger the process quits, so I can't glean any useful information that way either.

I'm tearing my hair out trying to figure out what's going on, but I don't know where to start. Anyone have any ideas?

A: 

Not a direct answer but if you on Vista(and the afters i think) there a chance you can try:

"A service notifies the SCM to queue a failure action by entering the SERVICE_STOPPED state and setting SetServiceExitCode function's dwWin32ExitCode parameter to anything other than ERROR_SUCCESS."

Windows Vista introduced a new flag, FailureActionsOnNonCrashFailures, which services set if they want to be able to notify the SCM to initiate a failure action: See more in Vista services

pinichi
Unfortunately I'm using Windows Server 2003. Should have mentioned that, sorry, but this info will be useful in the future.
jjacksonRIAB
A: 

Set up automatic generation of process dumps on this process using Process Dumper. You should be able to debug the dump post mortem to work out why this sporadic exception is happening.

It would also be useful to add diagnostics about your DB access to see what progress has been made before the exception happens. I wonder if your exit/restart strategy would work better if you schedule a task to shut down the service clean at midnight, and then start it up again once shutdown is complete.

Perhaps ODBC on the box is getting in a strange state up after the prior closedown using exit(1). When you try to repro this you say you stop and start it - is the exit mode the same in that case? Can you introduce a short delay between your exit and restart on the target server to allow ODBC connection state to clean up?

Steve Townsend
I installed process explorer on the machine and I'm going to try to get a full dump on the process after it hangs and load that into visual studio. It seemed to work after I tested it, now I just need to grab it at the right time and see where it went wrong.
jjacksonRIAB
The exit mode is always the same. This is a slow starting process because it caches numerous things from the database. It can take 6-10 seconds by itself to start. Since I'm calling exit(1), it should wait a minute before restarting since it considers that a failure condition. I can lengthen that to see what happens, but I'm not sure, because again if I do it manually through the SCM, whether restart or start, it will go like it should.
jjacksonRIAB