views:

1080

answers:

5

I need to know if a number compared to a set of numbers is outside of 1 stddev from the mean, etc..

A: 

I would look at the C# Math class.

Jonnie Drew
The .NET System.Math class doesn't have functions for Standard Deviation or Mean.
Bevan
+2  A: 

Here is code to calculate a standard deviation for a set of numbers. Once you have that, that gives you an upper and lower bound to compare the number to.

McWafflestix
A: 

You can avoid making two passes over the data by accumulating the mean and mean-square

cnt = 0
mean = 0
meansqr = 0
loop over array
    cnt++
    mean += value
    meansqr += value*value
mean /= cnt
meansqr /= cnt

and forming

sigma = sqrt(meansqr - mean^2)

A factor of cnt/(cnt-1) is often appropriate as well.

BTW-- The first pass over the data in Demi and McWafflestix answers are hidden in the calls to Average. That kind of thing is certainly trivial on a small list, but if the list exceed the size of the cache, or even the working set, this gets to be a bid deal.

dmckee
Your formula is wrong. It should besigma = sqrt( meansqr - mean^2 )Read this page http://en.wikipedia.org/wiki/Standard_deviation carefully to see your mistake.
leif
@leif: Yep. And I should have noticed the dimensional problem, too.
dmckee
-1: mathematically correct, but numerically this is bad.
Jason S
@Jason: You are worried about loss of precision effects? Or what? I just don't see it....OK I followed the link on Jamie's answer. Loss of Precision it is. Point taken. ::shrug:: I'm an experimental scientist. We don't get populations with variations confined to the the 10^-9 level, and we generally use double precision for everything, so those populations we get with variation confined to the 10^-5 level come out OK anyway.
dmckee
+2  A: 

Code snippet:

public static double StandardDeviation(List<double> valueList)
{
    if (valueList.Count < 2) return 0.0;
    double sumOfSquares = 0.0;
    double average = valueList.Average(); //.NET 3.0
    foreach (double value in valueList) 
    {
        sumOfSquares += Math.Pow((value - average), 2);
    }
    return Math.Sqrt(sumOfSquares / (valueList.Count - 1));
}
Demi
Dividing by Count - 1 or Count depends on whether we're talking about entire population or sample, yes? Looks like OP is talking about a known population but not entirely clear.
John Pirie
That is correct - this is for sample variance. I appreciate the highlight.
Demi
Your code crashes for the legitimate case of a list with one value.
SPWorley
for a sample stddev you shouldn't be passing an list with one item. A sample stddev of one item is worthless.
Demi
edited in response to comment.
Demi
+14  A: 

While the sum of squares algorithm works fine most of the time, it can cause big trouble if you are dealing with very large numbers. You basically may end up with a negative variance...

Plus, don't never, ever, ever, compute a^2 as pow(a,2), a * a is almost certainly faster.

By far the best way of computing a standard deviation is Welford's method. My C is very rusty, but it could look something like:

public static double StandardDeviation(List<double> valueList)
{
    double M = 0.0;
    double S = 0.0;
    double tmpM = 0.0;
    int k = 1;
    foreach (double value in valueList) 
    {
        double tmpM == M;
        M += (value - tmpM) / k;
        S += (value - tmpM) * (value - M);
        k++;
    }
    return Math.Sqrt(S / (k-1));
}

EDIT: I've updated the code accoridng to Jason's remarks...

Jaime
+1: I've read Knuth's commentary on this but have never known it was called Welford's method. FYI you can eliminate the k==1 case, it just works.
Jason S
OH: and you're forgetting the divide-by-N or divide-by-N-1 at the end.
Jason S
Now look what you've done. You've given me something new to learn. You beast.
dmckee
John Cook has a good article on standard deviation for large values at http://www.johndcook.com/blog/2008/09/26/comparing-three-methods-of-computing-standard-deviation/, and a followup describing the reasons behind the algorithms at http://www.johndcook.com/blog/2008/09/28/theoretical-explanation-for-numerical-results/.
Emperor XLII
It actually is John's blog that I'm linking for a description of Welford's method...
Jaime