Recently i faced with such problem - what is the continuity and how it can be measured? Usually, this term is used simultaneously with Accessibility. For example - our server was accessible for 22 hours this day, then accessibility equals 22/24 = 91.6%. But how to measure continuity?
The fact that it was available for 22/24 hours doesn't tell you anything about the composition of the uptime and downtime.
Example 1: Up for 22 hours straight, down for 2 hours.
Example 2: Up for 11 hours, down for 1. Up for another 11 hours, down for 1.
In the 2nd example, you still have 91.6%, but continuity would be half that because it was up for only 11 hours at a time for the 24 hour period.
We used software called 'BigBrother' about four years ago. It would track our server assets, and then provide a detailed graph of the server's availability.
Network Monitoring software is probably your best bet for acquiring this kind of information on your own network.
I believe it would 'ping' the server every minute, and allow for 3 consecutive failures before registering a server as down. We had scripts that would page certain employees when a machine was identified as 'down'. Overall, we were very happy with its performance.
BigBrother used to be freeware, but they have since gone to some other licensing scheme.
It might just be semantics.
- Continuity/Reliability = total uptime.
- Availability = total uptime when people want to use it.
If your server goes offline at 3am when no one is trying to use it, that decreases reliability but not accessibility.
|----------|#|-----|#|-------------------|
0 10h 15h 35h
'-' - means available,
'#' - means down, system failure
So, can the continuity be measured as (10h+5h+20h)/3 = 11.6h - the average interval of available state?
mean time to failure - the inverse of the failure frequency(non-repairable)
mean time between failure - the inverse of the failure frequency(repairable)
mean time to repair - how long it takes to fix (in your case - bring back online)
All sorts of operations research stuff on this
these are standard definitions
average "continuity" = mean time between failure
mtbf/(mtbf+mttr) = availability
natural definition - you can use what ever "it" is for mtbf time out of mtbf+mttr total time
note running parallel deceases mtbf (there are 2 machines so more failures) but decreases mttr to essentially switchovertime so mttr->0 (presumably) and availability goes up
So to measure what you term as "continuity" (actual mtbf)
total uptime divided by the number of failures(do not include the time you are bringing the device back online ie the unavailable time = repair time)