views:

282

answers:

5

Hi,

I am new to embedded systems programming, although I have done courses during studies, practical programming is still a bit further away.

Here is the problem: I have to program a small system on NXP LPC2103 microcontroller (ARM 7 based), without an operating system. It has a watchdog timer which needs to be regularly updated. The system has a GPRS modem with TCP/IP stack embedded, and initializing this takes time longer than the watchdog needs to timeout. When I call the initialization function, the system resets.

I spoke to a more experienced colleague and he suggested that I need to exit and reenter the same initialization function from the main function in which I bite the watchdog timer so long until the function finishes executing. The idea sounds good, but I would like to also hear some other experiences. Also, a reference (book or website) could be also useful, because I couldn't find anything specific to this.

I wouldn't like to call watchdog timer from the initialization function, I don't find this good.

+5  A: 

Generally, there are two approaches that I have adopted for this situation.

State Machine Initialisation

The first is much as your colleague has suggested: implemented the initialisation routines in a state machine called as part of the main loop and then stop calling the initialisation routines and start calling the main routines.

This is a simple and clean function, but can be a little awkward when it comes to particular long processes such as starting up a low frequency oscillator.

Time-Limited ISR Watchdog Handling

There is another alternative if you have a 'systick' or equivalent interrupt, for example an interrupt that is fired every 1 ms. In this situation, you can consider feeding the watchdog (e.g.) every 50 calls of the interrupt, but limiting the number of times the watchdog is fed to equate to the maximum allowable time for the initialisation routines to complete. It is then generally necessary (if you have, as you should in my opinion, a windowed watchdog) to have a short synchronisation loop at the end of the initialisation to ensure that the watchdog isn't fed before the minimum window time is reached, but this is trivial to implement.

This is quite a clean solution (as it doesn't make the initialisation routines into an unnecessary state machine) and deals with the issues of an initialisation routine hanging. It is, however, very important that the limit on watchdog calls in the ISR is enforced.

Discussion

Both have their advantages and disadvantages, but it's useful to have different approaches for different requirements. I tend to prefer the latter solution where I have things like a low frequency oscillator (which can take a while to start) as it avoids over-complicating the initialisation routines, which can be complicated enough on their own!

I'm sure others will offer other alternatives ideas as well...

Al
+3  A: 

The Watchdog in LPC2103 is highly customizable. You have many options to control it:

You can to not enable it until your initialization sequence is over.

You can extend the period between feeds to very long time.

The question is for what are you using the watchdog?

If it used to check if your software running well and not freezing, I don't see how the ISR option from AI will help you (ISR can continue work even your program is stuck).

For details about Watchdog options see WatchDog Timer (WDT) chapter (17) in the User manual for your MCU. http://www.nxp.com/documents/user_manual/UM10161.pdf

landmn
Changing the wd period is a viable option. And Al pointed out that you have to restrict the updates to the WD in the interrupt routine to a certain number of times to avoid not detecting a hanging init routine
ziggystar
1. LPC2103 doesn't have minimal window to feed the Watchdog, so you may limit the period for saving processor resources but not have to do it.2. How are you checking routines hanging from ISR? How does feeding the watchdog from ISR is helpful?
landmn
1. If you can extend the WD period to a sufficient duration, you won't use the ISR method. Using an ISR to artificially extend the WD period is an option available on most MCUs- this solution is general and maybe not necessary in LPC2103 for the given task 2. You can't check for hanging routines. But you can limit the ISR to reset the WD only a fixed number of times (like 100 times) and then let it expire.
ziggystar
+1  A: 

You might reconsider where in code the WD timer is serviced.

Typically the WD timer needs to be service during idle time (idle loop or idle task) and in the lowest level drivers (e.g. when you are reading/writing from/to the GPRS modem or the MAC for your TCP/IP connection, etc.).

If this is not sufficient, your firmware may also be doing nothing but burning up CPU cycles in a delay routine. Its fine to add a WD timer service here but you may have to adjust your delay timer to account for the WD service time.

If your application simply has some long, CPU intensive tasks that take more time to execute that the WD timer period allows, you might consider making the WD timer interval a bit longer. This may not always be possible but I like to keep WD timer references out of the upper layers of the firmware to keep the application layer as portable as possible. WD timers are typically hardware dependent, therefore any WD timer references in your code are rarely portable. The low-level drivers are rarely portable anyway so this typically is a better place to service a WD timer.

semaj
+3  A: 

I wouldn't like to call watchdog timer from the initialization function, I don't find this good.

It might be overkill for this one situation, but a general technique I've used for long running operations where you might want to perform other work is to have the long running function accept a callback function pointer that will be periodically called. The pattern that I usually use is to have a callback prototype that might look like:

int (callback_t*)(void* progress, void* context);

The long running function will periodically call the callback, with some information that indicates it's progress (how that progress is represented to what it means is dependent on the details of the particular function) and with a context value that the original caller passed in along with the callback pointer (again - what that parameter means and how it's interpreted is entirely up to the callback). generically, the return value of the callback function might be used to indicate that the 'long running thing' should cancel or otherwise change behavior.

This way, your initialization function can take a callback pointer with a context value, and just periodically call it. Obviously, in your situation, those callbacks would have to occur often enough to keep the watchdog happy.

int watchdog_callback( void* progress, void* context)
{
    kick_the_watchdog();

    return 0;  // zero means 'keep going...'
}


void init_modem( callback_t pCallback, void* callback_context)
{
    // do some stuff

    pCallback( 0, callback_context);

    // do some other stuff

    pCallback( 1, callback_context);


    while (waiting_for_modem()) {
         // do work...

         pCallback( 2, callback_context);
    }    
}

One nice thing about this pattern is that it can be used in different situations - you might have a function that reads or writes a large amount of data. The callback pattern might be used to have something display the progress.

Note that if you find that you have other long-running functions, the same watchdog_callback() function could be used to allow them to deal with preventing the watchdog from reseting. However, if you find yourself needing to rely on this type of thing often for the watchdog in particular, then you might need to consider how your tasks are interacting and either break them down more or use a more complex watchdog scheme that has the watchdog managed by its own task that other tasks interact with to keep the watchdog timer happy indirectly.

Michael Burr
I like the idea
Bogi
+1 Patterns such as this are basically a way to emulate C++ OOP benefits in plain C. What's also great about this abstraction is that you can use composition to attach multiple callback handlers to this function, without the need to change any code at all. For example, your callback function could itself invoke a list of other function pointers, one of them kicking the watchdog, the other one updating the display progress bar, etc.
Groo
+1  A: 

Watchdogs are great, but also a pain in the rear when your program or system does not fit it easily. The work best when you have code that looks (generally) like:

Watchdog_init();

hardware_init();
subsystem1_init();
subsystem2_init();
subsystem3_init();
...
subsystemN_init();

forever {
   Watchdog_tickle();

   subsystem1_work();
   subsystem2_work();
   subsystem3_work();
   ...
   subsystemN_work();
}

Very often you can design your program in such a way that this works, and generally it is very fool proof (but not totally).

But in cases like yours this does not work so well. You end up having to design and create (or possibly use a library) a framework that has various conditions that must be met that control if/when the watchdog get petted. This can be very tricky, though. The complexity of this code could itself introduce its own errors. You could very well write a perfect application except for the watchdog framework and your project may reset a lot, or all of your code might be bad and just continually pet the watchdog, causing it to never reset.

One good way to change the above code to handle more complicated situations would be to change the subsystemX_work functions to keep up with state. This can be done with static variables in the functions or by using function pointers rather than functions and change the actual function that is executed to reflect the current state of that subsystem. Each subsystem becomes a state machine.

Another way to go about working around long intentional waits with a quick biting watchdog is to break up the long running function into shorter pieces. Rather than:

slow_device_init();
Watchdog_tickle();

You could do:

slow_device_init_begin();
Watchdog_tickle();
slow_device_init_finish();
Watchdog_tickle();

And then extend this to stretch the watchdog timer by doing:

slow_device_init_begin();
for ( i = SLOW_DEV_TRIES; i ; i--) {
   Watchdog_tickle();
   if (slow_device_init_done()) {
       break;
   }
}
Watchdog_tickle();

Even still it can get more and more complicated. Often you end up having to create a watchdog delegate which just checks for conditions to be met and does or does not pet the watchdog based on these conditions. This begins to get very complicated. It can be implemented by making an object for each of your subsystems that has some method/function to call to test the subsystem's health. The health methods could be very complex and could even change as the state of that subsystem changes, though it should be as simple as possible so that it is as easy as possible to verify that the code is correct, and also because changes to how the subsystem works will require changes to how you measure health.

If you can ensure that some code runs at regular intervals then you could just have an integer for each subsystem that acts as the subsystem's local watchdog. Some code (maybe in a timer interrupt handler, but not necessarily) will decrement and test each subsystem's variable. If it reaches 0 for any subsystem then the watchdog is not tickled.

Watchdog_periodic() {
   for_each subsustem in subsystem_list { // not C, but you get the idea
      if ( 0 > --(subsystem->count_down) ) {
           // Do something that causes a reset. This could be returning and not petting
           // the hardware watchdog, doing a while(1);, or something else
      }
   }
   Watchdog_tickle();
}

Then each subsystem can tickle its own count_down for varying amounts of time by setting it's count_down to a positive value.

You should also notice that this is really just a software watchdog, even though it may make use of the hardware watchdog to do the actual reset.

You should also note that the more complicated the watchdog framework the more oppurtunity there is for errors in it as well as oppurtunity for errors in other code to cause it to work improperly. For instance a pointer error such as:

int x;
fscanf(input, "%i", x); // Passed uninitialized x rather than address of x

could result in setting some subsystem's count_down value, which could end up keeping the watchdog from biting when it should.

nategoose