views:

394

answers:

9

In my 10 years working as a software developer, I have oft been amazed by how poor a lot of code is (and I am certainly guilty of writing such code myself). At times it can be good for a laugh, but it makes me wonder: what differs in the development of critical systems (aviation, medical, military, automotive, etcetera) that yields such a higher standard of code quality?

If anyone has any experience working in these sectors, I'd love to hear about differences in methodologies, tools, procedures, schedules - anything you think contributes to higher quality code.

+2  A: 

A lot of this software software is written in languages like SPARK Ada which can be formally proved to be correct.

It comes down to basically spending more money and taking more time to develop a product.

Kevin
I agree, except for the more money and more time parts - costs of software are ultimately dominated by testing and defect fix, so fewer bugs up front can be cheaper. Likewise for time....
ja
Yes, but at some point you get a point of diminishing returns. For instance, most software teams would not spend two weeks debugging a random crash that happens in 0.001% of the cases. In high reliability systems, you will spend those two weeks.
Kena
+9  A: 

I work in medical imaging, and these are the main differences I see with the rest of the development world:

  • Code reviews for absolutely everything - including trivial changes.
  • A dedicated testing team + systematic unit tests (we're constantly working on improving our test infrastructure, automating more complex regression tests, etc...)
  • Traceability of every change, from requirements to the final test. That also means very atomized commits.
  • Strong code freeze phase, where every change and every bug fix must be approved by the team and project owner, no matter how trivial.
  • A lot of regulation-mandated documentation (sometimes on top of the actual useful project documentation)
  • Healthy paranoia of "unreproducible bugs" and other gremlins.

Basically, we try to follow the time-proven "best practices" and stay disciplined to avoid cutting corners. And yes, it's certainly more expensive that way.

Kena
+1  A: 
  • Hiring and training the developers (including the managers, testers, and customers)
  • System design for fault detection, tolerance, and recovery
  • Defined requirements
  • A mandate and schedule which isn't telling you to cut corners
  • Code inspections
  • Creating and running test cases
  • Phased deployment (to 'qualify' the software via beta and acceptance testing, before going into wide production).
ChrisW
+1  A: 
  • Testing
  • Use of specific languages, Ada, B Method, Lustre, ...
  • Testing
  • Code reviews, Documentation review, Traceability
  • Testing
  • Conformance with strict standards, DO-178, CMMi-3, ...
  • Testing
  • Use of hardware and software fault tolerance
  • Testing
  • Use of specific hardware
  • Testing
mouviciel
NPR - but kinda reminds me of The Lonely Island's "I'm The Boss". :)
Yuval A
+1  A: 

Sounds like you would be interested in Software Reliability Engineering,

Major Themes of SRE.

  • Design software to be fault tolerant, i.e., make a software system tolerant of faults that manifest themselves during operation. This emphasizes building into software to means to detect, isolate and recover from or minimize the effects of failures.

  • Use analytical techniques that help prevent / discover / detect / gauge software system faults and vulnerabilities.Thus leading to software designs with improved reliability.

  • Assess software reliability by the application of qualitative and, when possible, quantitative statistical methods. The goal is to base assessments and decisions on a foundation of experimentation, observation and empirical evidence.

At work there are groups that develop software for critical systems, so I was able to sit in several classes offered on the subject.

Handbook of Software Reliability Engineering book available for download.

Mark Robinson
+1  A: 

Most people don't like to hear it but we're all predictable. You know this by heart: You meet an old friend and, before (s)he does anything, you know what (s)he'll do or say, gestures, the way they laugh or pronounce words, etc.

The same goes for our mistakes. In situation A, we always make the same mistake (until we learn). So what you can do is build a simulation of your development team, feed it with metrics of the jobs that need to be done and it will tell you how many mistakes there will be in the final code and what kind they will be.

If you have enough data (say a team working on the same problem for 10 years), you can start to depend on searching and finding these bugs. This means: If the simulation tells you that there will be 10 bugs of kind X, you can rely on the fact that testing will find them (and consequently, you can delay shipment until all 10 are found and fixed).

More about this: They Write the Right Stuff This is an article about the team that developed the software for the Space Shuttle. At $1 Billion a piece, you don't want any surprises...

Aaron Digulla
A: 

I've done some work on air traffic control systems. They do some formal algorithm proofs to mathematically prove the correctness of some of the functions.

Sean Turner
A: 

IME the certification process is what creates the quality. Having a customer rep sit with you for hours on end, asking you to show how you ensure your code is bug free - and then asking you to prove that you follow those processes, instills a certain level of quality.

Add to that a level of rigorous QC that ouwlndt be seen in a typical commercial product and you get software that is an order of magnitude better than the norm.

Obviously all this comes at a significant cost, which is why commercial software generally is only 'good enough', rather than 'as near to perfect as we can get'

Visage
A: 

Some people use Formal Methods to verify that an algorithm is correct.

Artur Carvalho