views:

3143

answers:

39

Edit
What software practices are being used in mission-critical industries where safety is paramount? For example nuclear power plant.

Update
Originally this question was: How would you develop software for a nuclear plant? I have changed it to save good answers. I'm also making this question community wiki. Please help to word it better!

+19  A: 

I am not an expert. I just tell you what I heard.

For mission critical systems, the reference language is Ada. The development process is very strict and focuses on a test driven strategy with very small and highly tested (and stressed) routines. To address the potential of a crash, there is not only one system, but multiple redundant systems (in terms of sensors and processing units) which perform a "voting" procedure.

I don't know more than this.

Stefano Borini
It's Ada, not ADA. Ada is not an initialism or acronym, it was named after Ada Lovelace.
Bryan Oakley
fixed it, thanks
Stefano Borini
Link to Ada info, it sounds interesting and was built under contract of the DoD: http://en.wikipedia.org/wiki/Ada_%28programming_language%29
rmoore
Wasn't it an Ada bug that caused the gazillion dollar Arianne 5 to crash?
Hans Malherbe
Dont' remember. I do remember however that it was an overflow of a 32 bit value stored in 16 bits. This lead to a crash of the first system, kicking in the secondary system who had the same bug. and then fireworks...
Stefano Borini
@Hans, actually, according to wikipedia, the problem was caused by disabling a language feature for performance reasons
Jader Dias
@Hans: It's more complicated. The critical bad decision was to reuse a software component from Ariane 4 without throughly checking to see if it worked on Ariane 5. Ariane 5 lifts off a whole lot faster than Ariane 4, so a position indicator overflowed a 16-bit number.
David Thornley
+10  A: 

I remember about a month or so ago there was an article on slashdot about how NASA developed defect free software (that's how it was referred to) - it had a specific example from NASA. They made sure that they had very clear specs (written IIRC using Z), and had lots of testing, etc. You can find one of their documents here.

I am trying to find the link, but can't see it atm. Will post it later when I can find it.

In general, I would say that the following would be important:

  • Make sure that you have very specific specifications (use Z or some other formal language)
  • You can never write too many tests (for such a high risk application)
  • Make sure that you choose the correct framework / language / development environment (e.g. IIRC the Java license does not even permit the use of Java in nuclear plants)


EDIT: marcc found this link (second link shows everything on one page), which explains a bit more about how NASA operate, but isn't the link I was looking for.

a_m0d
I think the article you refer to is at http://www.fastcompany.com/magazine/06/writestuff.html
marcc
Because NASA specs have such a great reputation for their clarity... given their Mars Orbiter debacle with the Imperial/Metric unit mishap...
BenAlabaster
@marcc: No, that doesn't seem like the one I was referring to, but will add it into my answer
a_m0d
+5  A: 

Very carefully. Or not at all.

docgnome
+11  A: 

A formal specification language such as Z or Object-Z is a must. The software producing organization should have a high CMMI level as well, 4 or 5.

NiKUMAN
I'd go with formal methods too. I learnt Z and B a long time ago at University. At the time Z had a type checker fuzz free but no tools for refining the specification to implementation. The B-method on the other hand had much better commercial tool support allowing a specification to be turned into an implementation. I've never heard of Object-Z i'll have to take a look.
pjp
Are you saying the CMMI requirement is a legal requirement?
Marco Mustapic
High CMMI is useful to prove that your team are doing the right thing almost all the time. These people work very well when put in an environment where correctness is paramount. But a high CMMI in and of itself is not an indicator that software can be done well, only that it can be done reliably. On the other hand people who right well designed and correct software tend to write good software anyway.
Spence
+71  A: 

Well, not Java. According to the license agreement ...

You acknowledge that Licensed Software is not designed or intended for use in the design, construction, operation or maintenance of any nuclear facility

JP Alioto
I guess I will have to use PHP then ...
too much php
Wait, so Sun is an inappropriate name for the company.lulz
Overflown
It's oracle now I think!
JP Alioto
We had to rewrite an iTunes-based reactor control system for the very same reason
dbr
@1alstew1 Sun is really pushing for the use of solar power as they have complete monopoly in that market.
too much php
LOL That's a great restriction! Sun doesn't want to be blamed for Java being slow and crashing and blowing up half a state.
Chris Pietschmann
+3  A: 

Can you say what you are trying to learn from answers to this question?

Possibly you're wondering "could we do more to make sure our software doesn't break? What if we wrote software like they do for nuclear power plants"?

If so, then I don't think you've found the correct analogy. The cost of bugs in the systems of a nuclear power plant would be so high that, it's possible that software is not even permitted.

If this is what you're looking for, then I think you should look for examples of software where failure would be very expensive, but would not be life-threatening. Maybe systems that deal in millions of dollars per second, I don't know. But I think you want something achievable.

Chances are the differences aren't so much in QA, as in process, to make sure the bugs never get into the code in the first place.

John Saunders
Actually I just wanted to hear some war stories but apparently there is no such thing as "software that controls nuclear plant".
Pavel Chuchuva
War stories from people who design software for nuclear power plants? O_o
Eric
I like the irony of the implication of that question Eric... hahaha, that's awesome :D
BenAlabaster
+9  A: 

You wouldn't believe how much stuff runs on 30-year old software.

But that's besides the point - the answer you're looking for is that nuclear power plants, and pretty much most power plants, and all other kind of objects don't rely on software for their running operations. Think of it, ... how old are some powerplants, how long is their predicted lifetime (and a lot of them exceedes that lifetime) ... do you really wish to make them rely on buggy software on operating systems that change every 10 years ?

No, when it comes to that kind of objects, you have physical controlling mechanisms (from valves up to releys, up to ...), with alarms, then some more valves, then some more alarms and human monitoring, and then maybe some software controlling of processes (but that software still can't avoid the valve) ... you see my point probably.

How that old saying goes ?

If architects builded buildings like programmers build software,
the first woodpecker that came would destroy the civilization.

ldigas
"If architects builded buildings like programmers build software ..." - I'm kind of offended by this. Architects and programmers are not the same ... architects aren't usually asked to use a building material beyond its capabilities or completely restructure the foundation near project completion? I can't tell you how many times I've had to defend my software from a client who wants to add functionality to the UI specific to an external Add-On, or combine multiple data into 1 column, or combine unrelated databases into one. It's a sad state, but mostly because the material is so malleable.
John MacIntyre
@John - I'm sorry you feel offended by it. I didn't said it, I just quoted it. Although, I agree with it. As far as your complaint goes, all jobs have their demands. Programmers are not the only ones with unrealistic (crazy?) client demands.
ldigas
@Idigas - I would say that it's a little unfair to tarnish all programmers with the same brush. It's a gross overgeneralization that as a stereotype I don't think I could refute. However, at least in this instance I'd say that developers in this type of environment don't write your average business application and don't think or program in the same mindset. Every line of code is designed with the thought "what needs to happen if this line of code fails". *Most* general business programmers however don't consider this until after the fact.
BenAlabaster
@ldigas - ... sorry about the rant. ... I've attempted to come up with a more comprehensive response, but I just start ranting again. ;-)
John MacIntyre
@balabaster - It's all about the business commitment. You could develop some really robust software given the priority. Instead, most software is written with zero commitment even for end user testing.
John MacIntyre
@balabaster - Most sayings of this type are based on stereotypes. But, why does it bother you that much? I thought it as funny, until now. However, I really don't feel like starting a flame war, so if it offends you, I'll delete it. As for the other part of your comment, no, not really, they're not written with that thought in mind. You'd be suprised. People's ideas how powerplants work, operate and ... are grossly misguided by media.
ldigas
They're very boring places, actually. Normally, the most exciting thing that happens in there is when they get a new kind of desert in the cantine.
ldigas
Er, didn't offend me. Don't delete it on my account. Was just stating my position, wasn't trying to be an ass or start a war.. nuclear or any other kind :P
BenAlabaster
@balabaster - :-) Aaah, wasn't that important. And I don't feel like writing it all over again anyways :-)
ldigas
@Idigas that's what Rollback is for ;)
BenAlabaster
@balabaster - uf, I forgot about that. I still think of these forums as modifiable usenet.
ldigas
+1  A: 

Don't use Java. It is not approved for use in nuclear power plants.

xeon
+7  A: 

Depending on what the nuclear plant does with the material, be careful about what Google software you use in connection with your development work. By agreeing to their Terms of Service, you're also agreeing to:

(iv) not license, sell, provide or distribute the Software for use in connection with chemical, biological, or nuclear weapons or missiles capable of delivering such weapons

Sources:

https://registration.keyhole.com/download_earth_pro.html

http://sketchup.google.com/download/license_pro.html

http://toolbar.google.com/gmail-helper/terms_mac.html

The Matt
Because the first place I go for my missile guidance systems is Google Maps... :P
BenAlabaster
+1  A: 

Erlang with reported cases of 99,9999999% of availability would be a serious candidate language.

I'd also count on a very experienced team and a lots of effort on code coverage tests as well on stress tests.

Reginaldo
availability does not imply correctness
Eric
That's why I'd also dedicate lots of effort on testing.
Reginaldo
Nuke plant software needs to be both highly available and extremely fault tolerant. Doesn't Erlang achieve it's high availability by shooting errant processes in the head? I have to question the usage of a functional language in software where the whole point is to cause side effects.
Hans Malherbe
+43  A: 

I would use Eiffel with Design by Contract for correctness. Coincidentally it's already used in nuclear plants

DbC is a precondition/postcondition for each method, checked by the runtime during development. It includes class invariants that are checked before and after each method invocation during development. The precondition/postcondition and invariants form an exact specification on a module's interface.

http://dev.eiffel.com

I believe this psot deserves being voted much higher up, provided the Eiffel-statement is correct :)
Cecil Has a Name
+9  A: 

Apparently C++ is good enough for managing nuclear warheads. See the Coding Standards for the Joint Strike Fighter (PDF).

too much php
That's because C++ often tend to blow up, which makes it a perfect candidate for the application. ;)
kigurai
@kigurai - beautifully put :-))
ldigas
+37  A: 

Over a VPN. As far away as possible

Gerard
What if it crashes and you need to reboot?
drikoda
Get one of the chaps in the lead pants to press the reset button
Gerard
This is a hard real time software, so there isn't exists many chances that you could do anything if it crashes.
eKek0
+4  A: 
Tatiana Racheva
+14  A: 

A strange game. The only winning move is not to play.

RichieHindle
Couldn't resist: http://xkcd.com/601/
ya23
Wait, even that loses. Perhaps you should program a chess game instead?
nilamo
but I'm crap at chess!
Breton
+1  A: 

Using a formal method that you can mathematically prove you are not going to fail.

There are methods such as the B-Method that are used in safety critical systems, notably the Paris metro.

John Nolan
It's nice to take a specification and refine it to the implementation proving correctness as you go along. The only trouble is making sure that the specification is correct in the first place.
pjp
+2  A: 

I think I would spend 98.99999997% of my time writing test cases and testing my code. I would spend 1% writing code and the remainder on StackOverflow.

ojblass
Only 0.00000003% of time in StackOverflow? With 220 8h work days per year, that would mean about 1 second of StackOverflow every five years... what a sad perspective :)
Rene Saarsoo
+1  A: 

I'd spent most of the time writing detailed spec, as the guys in NASA do.

I found this article very interesting.

ya23
That's all well and good until they realise that half was written using imperial measurements and half was written in metric... did them a lot of good then :P
BenAlabaster
+1  A: 

I would only work as telecommute. Better from some other continent.

And for design would definitely recommend a hardware switch that cuts out PC control and puts everything on manual.

I wonder if there are developers that work on nuclear plant software among stackoverflow.com audience.

There are sure such developers somewhere, like guys working in CERN (haven't seen them alive though).

There should also be developers who work with hadron collider. They have likely already made a few bugs there. Though the thing crashed after a few days of operation, there is likely to be a memory leak. I mean, I used to find a few things on my desktop in Germany somewhat shifted from the original position I left them in the direction of Switzerland (micro black hole or whatever they created but did not properly disposed). Scary...

User
+12  A: 

I'd probably use some combination of Windows ME and Visual Basic 6. Then I'd RUN LIKE HELL.

Mike Robinson
Damn, you must be able to run reeeeeeeal fast. Must be like the "Bomb Disposal - If you see me running, try and keep up".
BenAlabaster
We should send you to Iran to work on their program.
Dave Markle
A: 

Remotely, if possible.

Kevin Pang
+1  A: 

For industrial control systems using PLCs there are tools availible that can analyze every possible state of the software. By using this data (in form of state graphs for example) you can see if there are any dead ends or other strange situations and thus rewrite the program to prevent those states from even existing.

Disclaimer: I really have no idea how nuclear software is made but i belive that such tools would be really helpful for this kind of application.

A: 

I'm sure a lot of the "software" actually used in nuclear power plants is on Windows, like most businesses: Excel and Word and Acrobat and Outlook. I'm sure they have boring old CRUD applications for rod inventory.

Nuclear power plants, like most large systems, are going to be made up of a combination of digital and analog controls and embedded and general purpose computers. The components will be programmed in a variety of languages and the choice in each case is going to be dictated by the individual requirements.

Cade Roux
+1  A: 

Definitely not with agile, and yes with an waterfall process.

In the tools wich I would use, certainly there will be formal tools wich I could verified with some mathematics, like Petri networks.

If the software will able to run in Windows I will write it in Delphi, just because java doesn't allows it.

eKek0
+3  A: 

Use what the regulator allows. No, seriously, you do NOT get to choose sometimes. It's quite possible that people will suggest crazy things like commodity operating systems.

This is the same mess the US SCADA industry is in with little to no security.

So the my money would be on locked down Solaris X ( it has quite nice real-time support, in addition to being like a bank vault.)

Ravenscar Ada springs to mind for the code . As noted you can't use Java. I've used Real-time Java for weapon systems and it works really well; but maybe one day Realtime java will be okay for nuclear plants; in which case it would be good.

Big ups on heavy formal methods, and using a whole-system simulator build in Matlab or similar. No I'm not smoking crack. The flight system guys now use Mathtlab's code generator at least for simulation.

And really heavy testing. Yes Veronica, we will be expecting 100% scenario coverage.

Tim Williscroft
Did you have a problem with the weapons not going off in time? :P
BenAlabaster
+1 for the mention of SCADA which nobody else here seems to have a clue about.
BenAlabaster
Weapon release was on time, every time. Fist shot hits the target or your bullet back.
Tim Williscroft
+43  A: 

I haven't worked specifically in nuclear production, but I have ample experience in system development where environmental safety (and human safety for that matter) is paramount. A lot of the development I have done in my career has been for use in this type of environment - whether it be Oil & Gas, Hydro-Electric production and could even be used in nuclear facilities, although I've yet to have that final honour - thankfully, perhaps.

The large majority of these types of systems are developed using SCADA systems and some form of HMI control system - which is what they call the GUI in industrial systems. This is usually an IDE built on a system designed purely for this purpose - CygNet, Wonderware, iFix or FactoryLink or similar.

Whenever you're coding for this type of environment, your first concern is failsafe. I will simplify to demonstrate my point (at the risk of being chastized by the SCADA community), but a system like this is controlled largely by hardware with safety limits hard-wired, firmware controlled and then software controlled.

The hard-wired limits are the outside boundaries of safety. In the event that firmware or software fails and these limits are breached, the system automatically shuts down. For instance on an oil pipeline this might mean closing a valve on a well to prevent an explosion at one end, or may mean venting excess to atmosphere or a burner if necessary.

Firmware limits are usually predetermined safety limits, considered safe for general use to push the system to.

Software is then used by an operator who will tweak the system to get the best possible performance or to meet other business targets - i.e. most power, coolest operating temperature, optimal performance etc.

In the event that anything fails, the underlying system takes over and operates safely. This means that in the event that the application were to fail catastrophically, the firmware built in the hardware controls can still operate the system safely. In the event that the firmware fails the hardware faults safely - i.e. shutting the system down to prevent environmental or human catastrophy.

BenAlabaster
+5  A: 

You might wish to read the book SafeWare by Nancy Leveson, it has some good case studies for software and dealing with preventing hazards.

Jason S
Safeware is good. people should read it.
Tim Williscroft
+1  A: 

This posting discusses safety critical software with quite a lot of fan-out links.

ConcernedOfTunbridgeWells
+1  A: 

CERN uses LabVIEW to control the LHC. If LabVIEW is good enough to recreate black holes, Higgs Bosons and the Big Bang I'm sure it can handle whimply ole' nuclear fision. :)

PaulG
Did it manage to create Higgs Bosons?
BenAlabaster
Not yet. Bozos keep breaking the damn thing. :)
PaulG
+7  A: 

There are standards and certifications for software development of safety-critical systems:

  • Avionics: DO-178B
  • Industrial: IEC 61508
  • Nuclear: IEC 61513 / IEC 62138
  • Railway: EN 50128

...and so on. Most of these overlap heavily with variations on CMMI-like processes, restrictions on language usage, requirements for fault analysis, diagnostics, fail safe states, etc.

So, in short, developing software for safety critical systems is not something you need to figure out on your own.

JeffP
+35  A: 
Michelle
Thanks for that. Did you actually mean core == actual magnetic cores?
John Saunders
Probably. This is in a nuclear power plant. Consider what gamma rays do to semiconductor memory.
Windows programmer
Come to think of it, since this is in a Canadian nuclear power plant, how did they get permission, and why would they want to, omit half of the official languages from mission critical safety controls?
Windows programmer
+1: Impressive answer
Stefano Borini
I would guess that core is not used because of its (alleged) resistance to nuclear energy. More likely, it was state-of-the-art and known to be reliable at the time the nuclear plant was designed.
Barry Brown
What were the reasons for using assembler?
Jason
+10  A: 

Not exactly a nuclear power plant but similarly mission critical is software for manned space missions.

The Nasa mission critical software process works off 4 propositions:

  1. The product is only as good as the plan for the product.
  2. The best teamwork is a healthy rivalry.
  3. The database is the software base.
  4. Don't just fix the mistakes -- fix whatever permitted the mistake in the first place.

Consider these stats : the last three versions of the program -- each 420,000 lines long-had just one error each. They must be doing something right.

There is a very good article explaining these propositions here:

"They Write the Right Stuff"

Obviously this cost a lot of money!

Pablojim
Coincidence that they had one error each? Do you think it was the same error? Does their version control system finger the person who wrote that error? :P
BenAlabaster
Maybe they forgot to merge the fix back into trunk...
Pablojim
"Maybe they forgot to merge the fix back into trunk". Nope. They might make that mistake once but they wouldn't allow themselves to make that mistake a second time. "4. Don't just fix the mistakes -- fix whatever permitted the mistake in the first place."
Windows programmer
#1 is an impossible, infinitely-recursive proposition
Aidan Ryan
A: 

hi there,

I understood that for some space missions in the past CLIPS expert system was used to control launching ..etc but also programming languages like LISP which keeps you in the state of the flow are considered to be safe for safety critical applications.

Tiberiu Hajas
A: 

Not really a practice, but a strong OS might be a start: http://en.wikipedia.org/wiki/VxWorks

cwap
A: 

I once heard at university that some coders aren't allowed to compile or run their own code. They have to work out for themselves whether it works as it ought to, and only then does it get compiled and tested.

Andrew Grimm
+1  A: 

During the landing of the first space shuttle mission (STS-1) all five redundant computers failed (due to a hardware fault). Mission commander John Young took over manual control for the landing.

So the lesson is... always have a manual override.

David Plumpton
A: 

Great post - I enjoyed reading it. Here is a great Canadian Standard for Nuclear Power Plants. N290.14-07 "Qualification of pre-developed software for use in safety related instrumentation and control applications in nuclear power plants". In it, it describes what standards to use when deciding if software can be installed in safety systems. Example if you can get somethng the is IEC sil level 4 then yes its good to go into the safety system.

+1  A: 

No garbage collection, no dynamic memory allocation, no multitasking and multi threading. Evrything must be stupid safe.

iskandar
+1  A: 

Design Diversity (n-version programming):

Quoting "Choosing Effective Methods for Diversity":

Design diversity is a popular defence against design faults in safety critical systems. Design diversity is at times pursued by simply isolating the development teams of the different versions, but it is presumably better to “force” diversity, by appropriate prescriptions to the teams. There are many ways of forcing diversity.

Quoting "Simulating Specification Errors and Ambiguities in Systems Employing Design Diversity":

In n-version programming different software versions, written to the same specification but developed independently execute in parallel. It is imperative that there is no communication between the teams responsible for developing the different versions. Quarantining the different teams is essential such that misunderstandings from one team do not affect the understanding of other teams. But quarantining teams is not always enough uncorrelated faults in distinct versions can lead to identical failures. [...] Many people have written off n-version programming as a dead approach to attaining high integrity software because of the n-version problem.But to our amazement n-version programming is alive and well in several different safety critical domains and it is particularly popular outside of the United States.

Quoting "Research on Diversity and Software Fault Tolerance at the Center for Software Reliability":

The use of diversity – doing things differently, in two or more ways, to protect against the failures of single procedures – has been ubiquitous in safety-critical industries for decades. In many of these applications, the benefits have been regarded as ‘obvious’, and it is only in more recent years that there have been formal models and studies of efficacy. [...] More recently (in the past 25 years) there has been considerable interest in the use of diversity in software-based systems. A driver for this research was the need for very highly reliable software, coupled with the realisation that there were severe difficulties in making a single version of a program very reliable (e.g. via reliability growth from extensive testing and debugging) (Miller, Morell et al. 1992; Littlewood and Strigini 1993). The use of multi-version software, developed independently and adjudicated at run-time, seemed a possible way out of the difficulties: early work in the field was probably motivated by an analogy with hardware redundancy. [...] There are some early applications of software diversity that appear to have been successful: examples include critical flight-control computers on Airbus aircraft (Briere and Traverse 1993); various railway signalling and control systems, see e.g. (Hagelin 1988). After experiencing many years of operational use, there seem to be no reports of catastrophic failure of these systems attributable to software design faults.

Quoting "Digital Avionics: A Computing Perspective":

Software is intangible, so it cannot exhibit degradation faults. Rather, software failures are necessarily due to design faults that cannot be masked through simple replication due to the lack of failure independence. Design diversity is a popular technique that attempts to overcome this difficulty by employing arrays of redundant components, each with a dissimilar design or implementation. Airbus and Boeing both use design diversity in their flight control systems but in different ways.

Airbus employs a design diversity technique called multiversion programming or N-version programming [3]. In multiversion programming, several system implementations are prepared from the same set of requirements by different developers under the presumption that the designs prepared by each developer will be independent—that is, the probability that one implementation will fail on a particular set of inputs given that another implementation has failed on those inputs is equal to the probably of the implementation’s failing alone. The various implementations are then assembled into a classical redundancy architecture in which they are run in parallel on the same inputs and their outputs are passed into a voter to check agreement. If a design fault is activated in one of the implementations, then, according to the theory, it is unlikely that the other implementations will also possess the fault and should continue to function. Clearly, the assurance that can be placed on multiversion programming rests on the assumption of design independence, and evidence exists that this assumption does not hold for all types of software systems [7].

The Boeing 777 FCS was not developed through multiversion programming but rather by employing diversity in the microprocessor architecture. Boeing compiled the software for the 777 FCS for multiple machine architectures and runs each version in tandem during system operation. This approach allows the 777 FCS to tolerate design faults in a specific microprocessor well as those introduced during compilation. It does not, however, provide any resilience to faults resulting from errors in the common source code from which the versions were built [13].

none
Could you please elaborate on why these quotes are interesting?
John Saunders