Watchdogs are great, but also a pain in the rear when your program or system does not fit it easily. The work best when you have code that looks (generally) like:
Watchdog_init();
hardware_init();
subsystem1_init();
subsystem2_init();
subsystem3_init();
...
subsystemN_init();
forever {
Watchdog_tickle();
subsystem1_work();
subsystem2_work();
subsystem3_work();
...
subsystemN_work();
}
Very often you can design your program in such a way that this works, and generally it is very fool proof (but not totally).
But in cases like yours this does not work so well. You end up having to design and create (or possibly use a library) a framework that has various conditions that must be met that control if/when the watchdog get petted. This can be very tricky, though. The complexity of this code could itself introduce its own errors. You could very well write a perfect application except for the watchdog framework and your project may reset a lot, or all of your code might be bad and just continually pet the watchdog, causing it to never reset.
One good way to change the above code to handle more complicated situations would be to change the subsystemX_work functions to keep up with state. This can be done with static variables in the functions or by using function pointers rather than functions and change the actual function that is executed to reflect the current state of that subsystem. Each subsystem becomes a state machine.
Another way to go about working around long intentional waits with a quick biting watchdog is to break up the long running function into shorter pieces. Rather than:
slow_device_init();
Watchdog_tickle();
You could do:
slow_device_init_begin();
Watchdog_tickle();
slow_device_init_finish();
Watchdog_tickle();
And then extend this to stretch the watchdog timer by doing:
slow_device_init_begin();
for ( i = SLOW_DEV_TRIES; i ; i--) {
Watchdog_tickle();
if (slow_device_init_done()) {
break;
}
}
Watchdog_tickle();
Even still it can get more and more complicated. Often you end up having to create a watchdog delegate which just checks for conditions to be met and does or does not pet the watchdog based on these conditions. This begins to get very complicated. It can be implemented by making an object for each of your subsystems that has some method/function to call to test the subsystem's health. The health methods could be very complex and could even change as the state of that subsystem changes, though it should be as simple as possible so that it is as easy as possible to verify that the code is correct, and also because changes to how the subsystem works will require changes to how you measure health.
If you can ensure that some code runs at regular intervals then you could just have an integer for each subsystem that acts as the subsystem's local watchdog. Some code (maybe in a timer interrupt handler, but not necessarily) will decrement and test each subsystem's variable. If it reaches 0 for any subsystem then the watchdog is not tickled.
Watchdog_periodic() {
for_each subsustem in subsystem_list { // not C, but you get the idea
if ( 0 > --(subsystem->count_down) ) {
// Do something that causes a reset. This could be returning and not petting
// the hardware watchdog, doing a while(1);, or something else
}
}
Watchdog_tickle();
}
Then each subsystem can tickle its own count_down for varying amounts of time by setting it's count_down to a positive value.
You should also notice that this is really just a software watchdog, even though it may make use of the hardware watchdog to do the actual reset.
You should also note that the more complicated the watchdog framework the more oppurtunity there is for errors in it as well as oppurtunity for errors in other code to cause it to work improperly. For instance a pointer error such as:
int x;
fscanf(input, "%i", x); // Passed uninitialized x rather than address of x
could result in setting some subsystem's count_down value, which could end up keeping the watchdog from biting when it should.