views:

83

answers:

1

any concrete suggestions for computing application/System reliability ?

+1  A: 

For predicted reliability, you multiply the predicted reliabilities of all critical components together (under the assumption that the reliabilities are independent, which is not generally safe). With non-critical components, you've got to work out whether they group together to form a critical system or whether there is some characteristic time which they can be down for before coming critical or … Well, in summary, you've just got to analyze very carefully.

But predicted reliability is not the same as measured reliability! If you're at all serious about this (and generally 99.9999% reliability is very serious stuff) then you're going to have to measure, and you need to work out very carefully what to measure too and from what perspective. There's no point in measuring website availability from within the same cluster if the characteristic problem of the deployment is off-site networking bandwidth.

Donal Fellows