views:

500

answers:

5

I'm looking for some good articles on fault tolerant software architectures. Could I please have some recommendations.

+1  A: 

It would be very difficult to sum it up in one article since there are multiple ways to achieve fault tolerance in software. These principles deal with Desktop, Server applications and/or SOA. Also there are multiple methodologies, few of which we already follow without knowing; Exception handling for example. It would be a herculean feat to try to drill down all the concepts in one article. You can find a lot of articles with a simple search on google.

For my FYP, I researched on OS wide Self Healing systems. I followed the Sun Solaris 10 architecture and IBM's Autonomous Computing research (http://www.research.ibm.com/autonomic/).

fasih.ahmed
+3  A: 

I found 'Release It!' to be an excellent read.

In Release It!, Michael T. Nygard shows you how to design and architect your application for the harsh realities it will face. You’ll learn how to design your application for maximum uptime, performance, and return on investment.

toolkit
+2  A: 

Handbook of Software Reliability Engineering you can read it in pdf. One of the main principles of software reliability is fault tolerance.

Take a look at chapter 14 Fault-Tolerant software.

Mark Robinson
+1  A: 

Link dump! :)

These are some of the on-line things I got some ideas (or just for terminology checkup) from when researching a certain aspect of redundancy.

ACM requires membership.

Henrik Gustafsson