views:

385

answers:

5

Looking at the general trend of comments in my question about Building an Aircraft using Agile, the biggest problem other than cost appears to be safety.

Do people feel that it is not possible to build a safe system (or prove it is safe) using agile? Doesn’t all the iterative testing mitigate this? Is it likely that a piece of software developed using agile will never be as reliable as counterparts such as waterfall?

+2  A: 

There are a number of high-profile software failures that illuminate this issue. In particular, Ariane 5 Flight 501 and the Therac-25 are both examples of software failures that bring this problem into sharp relief. The Ariane 5 rocket veered off its flight path 37 seconds after launch due to an integer overflow in the guidance software. The accident cost $370 million in lost equipment, but there was no loss of life. The same cannot be said of the Therac-25, a medical machine that killed several people with lethal doses of radiation.

Could these problems have been prevented with a better software methodology? I'm not so sure. The management decisions that contributed to the failure of the Ariane 5 had little to do with the manner in which the software was constructed, and the Therac-25 investigation was hampered by the belief that it was not possible for the machine to fail.

Better testing methodologies could have helped. It's hard to believe that a good statically typed compiler would have failed to find the integer overflow. New testing methodologies like Pex, with its built-in theorem prover, have the capability to search for corner cases, and could have identified the sensor anomalies that existed in the Therac-25.

But no technique is reliable unless you have an uncompromising commitment to safety, from the very highest levels of management all the way down to the people who box the product for shipment.

Robert Harvey
The compiler didn't fail to find the integer overflow. The overflow checking for that line of code was deliberately disabled, because multiple independent calculations *proved* that it was physically and structurally impossible for the rocket to get into a state where that value would overflow – it would have been ripped to shreds long before that. So, how *did* the overflow occur, then, if it was impossible? Wait, did I forget to mention that those calculations where for the much smaller and less powerful Ariane 4? Well, so did the engineers who made those calculations …
Jörg W Mittag
IOW, the problem was, that the flight control software of the Ariane 4 was reused in the Ariane 5 (a good idea in principle, to reuse a stable, proven, tested piece of software), but it was forgotten to check whether all the assumptions that were made, were still valid under the new specs.
Jörg W Mittag
But Ariane 5 and Therac-25 weren't Agile projects, I don't think, so I'm confused at how this relates.
Dean J
@DeanJ He points out how the software methodology was not the cause of the issue, but a general failure of safety commitment.
Jweede
A: 

Most safety critical or mission critical systems can benefit from a more standard development structure such as the waterfall model and formal code reviews. Such methods help maintain a more structured code base. Great Book about software construction - especially if the project has already begun using an Agile process - Code Complete 2 ed.

BenHayden
Do we feel that good practises combined with refactoring under agile still fails to keep the code as well structured?
Ben Breen
I guess its fair to say that agile processes are intended for agile applications, i.e. line of business applications. Critical systems potentially benefit from more robust processes, such as waterfall. OTOH, line of business applications written with the waterfall model tend to collapse under their own weight.
Robert Harvey
+1  A: 

Agile is a dynamic development model - You use it when requirements of your application are to be changed fast and unforeseen. Also if the number of your developers are still countable. You do NOT use it just because it is modern/in/hip/cool.

Sure you can find errors with unit tests, but they never prove their absence. Changing/Adding requirements of the application during development greatly involves adding hidden errors.

For exactly planned applications which is typical for safety critical applications you want to use more static development models like waterfall or v-model.

codymanix
+5  A: 

Agile is a method of managing a project, not a method of testing or verifying the safety of a finished project.

A safety critical system would still need extensive testing after it is complete (functionality wise) to be absolutly sure it is actually up-to-task. I would expect that this sort of work would be given over to a separate team of testers who are specifically focussed on such testing.

Agile is good with soft requirements, where the traditional product lifecycle is long enough for the business goals to have changed, though in a safety-critical environment, I think that rapidly changing requirements or under-specified requirements would be A Very Bad Thing.

I don't buy the idea that using waterfall would in some way give the code some intrinsic order or stability - if the individual sprints are well managed, the code tested and reviewed, then the shorter cycle would produce code of equal quality, just in chunks.

Using Scrum gives you a heads-up earlier in the project timeline when things are getting problematic - it's not going to do anything but remove hiding places for poor performing managers / devs / whoever.

In short, it is possible to build any sort of system using Agile methods, just so long as you don't expect it to test what you have built. Thats for the testers.

Mr. Matt
+2  A: 

The WHOLE point about safety critical systems that everyone seems to be missing here is that because of their potential to cause loss of life (sometimes on a large scale), they have to be proven to be correct. Often they require an operating certificate unless a licensing authority that is satisfied that the system's requirements have been accurately specified, (often using formal methods like Z and VDM), that the design is a reflection of those requirements (and can be proven to be such) and that the code is an accurate reflection of the design (again proven to be so).

More often than not, the option to test the product to provide such proof doesn't exist (OK guys let's go life with the nuclear reactor, Boeing 777, Therac 25 - whatever and see where the bugs are). Safety critical systems (especially those categorised at S.I.L 3 and above) need thorough and comprehensive documentation which is totally at odds with the Agile manifesto in all respects as far as I can see. Programmers of safety critical systems are not even allowed to make changes to the relesed software without requesting a new revaluation of the software from the licensing authority and rightly so, given the rigour that goes into proving the correctness of the first version and the catestrophic implications for a screw up.

paul