views:

362

answers:

9

Upgrading software for embedded devices often has the possibility of "bricking" the device, e.g. if power should happen to fail while in the midst of writing software to FLASH. Two questions:

  1. What are some best practices for implementing the upgrade mechanism so as to minimize the probability that the device will be "bricked"?
  2. What are some best practices for making the upgrade process fail-safe, so that events like power failures while installing software to FLASH can be recovered from?
+2  A: 

checksums on the internal flash, with a default backup if the CRC/Checksum doesn't work out. This way if the device gets an incorrect checksum then it knows the upgrade was incomplete and can reset to default/previous firmware stored on another device.

This requires some pre-boot (maybe in the bootloader) to check the checksum. A static bit of code.

edit: Further to comments elsewhere. If you want to check for wrong firmware, not just corrupt firmware, your checsum/checkdata can encapsulate version information too (and a check of that header) I think Linksys routers do this which can make them problematic to reflash with custom firmware.

Aiden Bell
+2  A: 

checksums are good but only save you from flashing in corrupted data. what if you flash in an image file with a valid checksum but for a different product model. a read only default boot loader that can be accessed in case of emergency corruptions is the best thing I have seen.

ThePosey
You could add model information to the image (which would be verified as part of the checksum) which the boot loader would check and flag images with the wrong model as invalid images and restore from backup.
Aaron Maenpaa
Checksum/check data could encapsulate version data. I think Linksys WRT* router do this.
Aiden Bell
Use a header which must be checked first; see my answer elsewhere on this page.
Steve Melnikoff
Have the loaded firmware (not the bootloader) check the product and do a self test or whatever. When it is finished, set a flag in the image to say bootup was successful. After reset or watchdog, the boot loader checks that bit is set and if not uses the old backup image.
Tom Leys
+7  A: 

It all depends on how critical the application is. The two basic approaches (backup and bootloader) are also combined sometimes.

Many systems have a read only bootloader (like redboot), and then two banks of flash memory (on the same chip, most often). The bootloader then has a flag to choose which bank to boot from. The flag will then change based on events like upgrades (failed or successful), and so on.

So, when upgrading, the running version copies the new load into the backup bank, checks the checksum, toggles the boot flag, and then reboots the device. The device reboots on the new bank, with the new load. After the reboot, the new load can copy itself into the backup bank.

Often there is also a watchdog timer with a hardware reset. This way, if the firmware goes insane, it fails to kick the watchdog, the hardware reset will reboot the device, and the bootloader will look for a sane load.

The Open Mesh project is a good example of this approach.

Sean Cavanagh
just curious - what if you need to upgrade the boot-loader?
sybreon
@sybreon: the application and bootloader reverse roles, using a similar strategy. In other words, the application code could contain the ability to download a new bootloader, with space for two bootloaders reserved. Once a new bootloader is downloaded and verified, a final write is done (most likely to the reset vector, or some immutable startup code), to ensure that the new bootloader is called at startup.
Steve Melnikoff
In general, the booloader is so simple and does so basic things that there is no need to upgrade it.
mouviciel
+3  A: 

More specifically...

Download the replacement image to an area of memory without overwriting ANY of the current program space. Wait until the download is complete, THEN compute and compare CRCs.

If space is really a problem, you can do the 'default backup' AKA 'recovery mode' sort of thing, but it's much slicker to not do this destructively.

If you're -really- slick... you can do a single write update to FLASH to direct the device to boot from the new code location. This will ping/pong between two totally seperate code sections. This is about the safest way you can do this:

  • ALWAYS have a non-updatable recovery bootloader (Nano-loader) which can be signaled to load new code somehow if everything goes wrong.
  • Two seperate program spaces
  • Each program space has a "CRC" field, a "burn number" (higher than the other code page's number), and an "invalid" word (all Fs - don't require an erase to update the "invalid" marker)
  • Once a download is complete, verify the CRC. If it's good, burn the 'invalid' marker on the old version's program space.
  • The Nano-loader checks the 'invalid' marker to know which to boot to. In the case that they're both valid, do a CRC check. If they're still both valid, then take the higher burn number entry

Oh, and when people say checksum... don't 'check the sum'... Do a proper CRC.

darron
+1  A: 
  1. Keep a read-only bootloader in memory no matter what
  2. In the bootloader, allow a failsafe method (e.g. by holding button X down during restart) of reloading new program memory from an available source of input (SD card, RS232, whatever).
Jason S
Your method only works if it is feasable to send technicians out to every device (i.e not suitable for sensor mesh networks)
Tom Leys
A: 

On some laboratory (rather than consumer) devices, I've seen the boards built with programmer circuits. In the worst case, you uses can open the case, plug into the programmer and reload the default software---or send it back for you to do the same. 'Course, this costs real money.

Some custom board used in some of my projects have had replaceble ROMs. This is cheeper, but less convenient.

dmckee
+1  A: 

To answer both questions, and regardless of any particular hardware resources:

  • Ensure that, before running any application code (at startup, or after a download has completed), the bootloader does a CRC check on the application. If it isn't valid, the bootloader doesn't run the code.

  • If the bootloader decides that it can't run the application code, it must have the ability to signal this to the user, and to start the download again.

These clearly become more important if the processor has insufficient flash to store a backup application, or insufficient RAM to store the new one until it's finished downloading.

In these cases, it makes sense for the file being downloaded to have a small header which allows the bootloader to determine that the file is appropriate to the system. This header could also have a CRC. If the header is valid for this system, and the CRC is correct, then the bootloader can erase the flash (but not itself!), and continue the download. If not, it aborts without touching the existing application code.

Steve Melnikoff
+1  A: 

in my experience it comes down to what your cost point is, if you can afford to have twice the code/data space on the device as you need and can reboot, its simple, store the new version in all your extra space, do proper checksum checking on the new imagine and I also recommend deeper checking into the new firmware image as it could be spoofed if you don't, for example some sort of encrypted key guaranteeing the origin of the new imagine. You can do this with external memories as well, for instance in one project using a PIC I had an external EEPROM for the new firmware image and a flag that, if set on boot would load a new imagine from the EEPROM.

if you are not so lucky, i.e. you can't afford that space, then life gets more interesting and there is likely no guaranteed way to completely avoid the chance of failure during an update. In all cases the bootloader should be in a write protected piece of memory and if you can afford the space, should have some basic form of external connectivity, i've done systems with very simple USB drivers in the bootloader and systems with REALLY simple UDP only network stacks. Either will allow you to at least get a new imagine to the device if there is a failure during an update. In these cases I highly recommend putting the bootloader in a read only memory area, you lose the ability to update it but a haywire update also won't leave you with a completely bricked device. In such a case the bootloader is small enough that you can be pretty sure of its correctness.

the last possibility is a system that needs to update some portion of its code while running...this is really tricky and usually requires complex control over the location of functions in memory and major planning ahead in the memory layout to achieve. Its not over fun but its doable.

+1 - Nice explanation of why the backup-restore solution doesn't always work. I have used the same offchip EEPROM idea in a deployed solution.
Tom Leys
Welcome to Stack Overflow Mark!
Tom Leys
A: 

I know this Q is answered, but some people have a need for more reliability. If your project is really mission critical you can go this route.

Basic plan is to always have a backup plan that cannot fail.

  1. have PIC or other microcontroller that can program the real processor's flash memory. You make it use a checksum on a block of data, and interface it via serial, usb or even ethernet ( don't laugh it's not that hard) This device CANNOT be reprogrammed in the field ( or maybe even EVER) so you always have a backup plan. PIC's can run web servers over PPP/SLIP or ehternet so interfacing to it is NOT necessarilty awkward. Google TCP-Lean. Try not to titter. (Site is suitable for work). Put the progrmming port somewhere else. Security is not ensured.

  2. The program runs on the main CPU and runs a boot loader of it's own. You have three programs: a boot loader, a maintenance program and the real program.

This allows upgrades to the boot leader process, as well as the program. You can run a backup boot loader in some extra flash, with a watchdog to reset you and use the backup if you don't boot.

So you have upgradeable embedded application, upgradeable boot loader, and upgradeable maintenance mode. AND a backup mode that cannot fail.

Hope nobody needs to find this useful.

Tim Williscroft