views:

1139

answers:

4

I'm very interested in the answer to another question regarding watchdog timers for Windows services (see here). That answer stated:

I have also used an internal watchdog system running in another thread. That thread looks at the main thread for activity like log output or a toggling event. If the activity is not seen then the service is considered hung and I shutdown the service.

In this case you can configure windows to auto-restart a stopped service and that might clear the problem (as long as it's not an internal logic bug).

Also services I work with have text logs that are written to a log. In addition for services that are about to "sleep for a bit", I log the time for the next wake up. I use MTAIL to watch a log for output."

Could anyone give some sample code how to use an internal watchdog running in another thread, since I currently have a task to develop a windows service which will be able to self restart in case it failed, hung up, etc.

I really appreciate your help.

+3  A: 

You can configure from service properties to self restart in case of failure

Services -> right-click your service -> Properties -> First failure : restart the service -> Second failure : restart the service -> Subsequent failure : restart
ArsenMkrt
Might work for some services, but what about a service that usually always runs but suddenly gets stuck and stops making progress?
romkyns
+2  A: 

I'm not a big fan of running a watchdog as a thread in the process you're watching. That means if the whole process hangs for some reason, the watchdog won't work.

Watchdogs are an idea lifted from the hardware world and they had it right. Use an external circuit as simple as possible (so it can be provably correct). Typical watchdogs simply ran an timer and, if the process hadn't done something before the timer expired (like access a memory location the watchdog was watching), the whole thing was reset. When the watchdog was "kicked", it would restart the timer.

The act of the process kicking the watchdog protected that process from summary termination.

My advice would be to write a very simple stand-alone program which just monitored an event (such as file update time being modified). If that event didn't occur within the required time, kill the process being watched (and let Windows restart it).

Then have your watched program periodically rewrite that file.

paxdiablo
+2  A: 

Other approaches you might want to consider besides regularly modifying the lastwritetime of a file would be to create a proper performance counter or even a WMI object. We do the later in our build infrastructure, the 'trick' is to find a meaningful work unit in the service being monitored and pulse your 'heartbeat' each time a unit is finished.

The advantage of WMI or Perf Counters over a the file approach is that you then become visible to a whole bunch of professional MIS / management tools. This can add a lot of value.

Tom Kirby-Green
A: 

There are windows service monitoring apps out there that take care of what you are wanting to accomplish. Also, if more than one machine needs to be monitored, be sure to remove the network from the equation and monitor each machine separately. This question has a good point regarding that.

ExtraLean