ansaurus

Question

Calculating timespans for outages when I have only the results of failed calls

Answer 1

+3 A:

It sounds like you basically need to guess at some maximum time between visits. So if you actually only had one visit every 10 days, then everything in that table could represent a single outage... but it's rather likely that it didn't.

So guess at a reasonable value - e.g. 5 minutes (it would be unusual to not have any hits in 5 minutes, and unusual for two separate outages to occur within 5 minutes of each other). Then find any gap between two values (sorted into chronological order, of course) where the gap is greater than that time interval. Those records will indicate the end of one outage and the start of the next.

Exactly how you do that will depend on your environment - I know how I'd do it in C#, but I wouldn't presume to try it in straight SQL, for example :)

Jon Skeet 2010-08-02 21:54:01

On further thought, my question is whether I'm going to have to iterate through the entire list (which could get long) and decide if each value constitutes the beginning/middle/end of an outage, and it sounds like I will. Thanks for your help.

rwmnau 2010-08-02 22:00:21

Answer 2

+1 A:

Unless you have some other information about how often the server gets hits, you can't answer the question you are trying to answer.

And even if you have that data, a rigorous analysis of server outages would not be easy:

If you have information about how often the site has been historically hit in a certain time interval (say 6am-7am on a Monday), you could model the probabilities of server failures using a Poisson process and fit it to your data for that interval. that will give you the likelihood of an outage in that time interval, and if you model the length of an outage correctly (or guess it well), you could get an expected duration of all outages in a given day.

For most applications, it would be much simpler and more accurate to implement the check process you mentioned in your post.

Assaf 2010-08-02 21:58:29

The website is for a major ISP, so length between visits isn't a concern - most of the day, it's less than a second, though it can be minutes overnight. I'll probably end up walking the list and deciding, for each value I have, whether it's significantly tied to the start/end of an outage. Thanks!

rwmnau 2010-08-02 22:02:40

ansaurus

tags:

views:

answers:

Calculating timespans for outages when I have only the results of failed calls

related questions