views:

93

answers:

3

Hi!

I have to track if given a week full of data integers ( 40, 30, 25, 55, 5, 40, etc ) raise an alert when the deviation from the norm happens (the '5' in the above case). An extra nice thing to have would be to actually learn if 5 is a normal event for that day of the week.

Do you know an implementation in ruby that is meant for this issue? In case this is a classic problem, what's the name of the problem/algorithm?

+2  A: 

It's a very easy thing to calculate, but you will need to tune one parameter. You want to know if any given value is X standard deviations from the mean. To figure this out, calculate the standard deviation (see Wikipedia), then compare each value's deviation abs(mean - value) from the mean to this value. If a value's deviation is say, more than two standard deviations from the mean, flag it.

Edit:

To track deviations by weekday, keep an array of integers, one for each day. Every time you encounter a deviation, increment that day's counter by one. You could also use doubles and instead maintain a percentage of deviations for that day (num_friday_deviations/num_fridays) for example.

David Kanarek
How could this be extended to learn about days of week?
MB
A: 

The name of the algorithm could be as simple as "calculate standard deviation."

http://en.wikipedia.org/wiki/Standard_deviation

However, any analysis you do should be specific to the data set. You should inspect historical data to get at the right algorithm. Standard deviation won't be a good measure at all unless your data is normally distributed. Your data might even be such that you just want to look for numbers above a certain max value... it really depends.

So, my advice to you is:

1) Google for statistics overview and read up on basic statistics.

2) Inspect any historical data you have.

3) Come up with some reasonable measure of an odd number.

4) Test your measure against your historical data and see if it highlights the numbers you think it should.

5) Repeat steps 2-4 as necessary to refine your algorithm.

Andrew Johnson
#3: (n % 2) == 1 :P
klochner
+1  A: 

http://en.wikipedia.org/wiki/Control_chart describes classical ways of doing this sort of thing. As Jonathan Feinberg commented, there are different approaches.

Darius Bacon