views:

98

answers:

4

Imagine the radio of a car, does the electro magnetic fields through which the car goes through, have interference in the processing? It's easy to understand that a strong field can corrupt data. But what about the data under processment? Can it also be changed?

If so how could you protect your code against this? (without electrial protections just code ones)

A: 

Could it be created a couple of twin processes comparing the program counter value on each one of them, and checking if the result match all the time? Maybe that way one could be sure of any errors.

fmsf
You don't want it running on the same processor really, so just twin processes wouldn't do you much good.
workmad3
Brian Knoblauch
A: 

I doubt you can.

Code that is changed won't run, so likely your program(s) will crash if you have this problem.

This is a hardware problem.

Lasse V. Karlsen
+4  A: 

For the most robust mission critical systems you use multiple processors and compare results. This is what we did with aircraft auto pilot (autolanding). We had three autopilots, one flying the aircraft and two check that one. If any one of the three disagreed, it was shut down.

Jim C
What would the liklihood of all 3 autopilots having the same error occur be? I know it's tiny, but considering the risk I think it's worthwhile considering...
workmad3
I'm assuming that, if the one flying the aircraft was shut down, one of the others woke the pilot. :-)
Adam Liss
+1  A: 

You're referring to what Wikipedia calls soft errors. The traditional, industry-accepted work-around for this is through redundancy, as Jim C and fmsf noted.

Several years ago, our repair department's analysis showed an unacceptable number of returned units with single-bit errors in the battery-backed SRAM that held the firmware. Despite our efforts at root-cause analysis, we were unable to explain the source of the problem. At that point a hardware change was out of the question, so we needed a software-only solution to treat the symptom.

We wanted a reliable fix that we could implement simply and quickly, so we generated parity checks on blocks of code in the SRAM. We chose a block size that required very little additional storage for the parity data, yet provided enough redundancy to detect and correct any of the errors we'd seen and then some. It logs the errors it detects and indicates whether it can correct them, so we still know when bit errors occur in the field. So far, so good!

Our product manager did some additional research out of curiosity and convinced himself that the culprit was cosmic radiation. We never proved it unequivocally, but he was satisfied that the number of errors seemed to agree with what would be expected based on the data he found. I'm just glad the returns have stopped.

Adam Liss