ansaurus

Question

How do I determine the standard deviation (stddev) of a set of values?

Answer 1

A:

I would look at the C# Math class.

Jonnie Drew 2009-05-22 00:27:49

The .NET System.Math class doesn't have functions for Standard Deviation or Mean.

Bevan 2009-05-22 01:01:01

Answer 2

+2 A:

Here is code to calculate a standard deviation for a set of numbers. Once you have that, that gives you an upper and lower bound to compare the number to.

McWafflestix 2009-05-22 00:33:47

Answer 3

A:

You can avoid making two passes over the data by accumulating the mean and mean-square

cnt = 0
mean = 0
meansqr = 0
loop over array
    cnt++
    mean += value
    meansqr += value*value
mean /= cnt
meansqr /= cnt

and forming

sigma = sqrt(meansqr - mean^2)

A factor of cnt/(cnt-1) is often appropriate as well.

BTW-- The first pass over the data in Demi and McWafflestix answers are hidden in the calls to Average. That kind of thing is certainly trivial on a small list, but if the list exceed the size of the cache, or even the working set, this gets to be a bid deal.

dmckee 2009-05-22 00:44:46

Your formula is wrong. It should besigma = sqrt( meansqr - mean^2 )Read this page http://en.wikipedia.org/wiki/Standard_deviation carefully to see your mistake.

leif 2009-05-22 00:50:00

@leif: Yep. And I should have noticed the dimensional problem, too.

dmckee 2009-05-22 01:28:55

-1: mathematically correct, but numerically this is bad.

Jason S 2009-05-22 12:34:16

@Jason: You are worried about loss of precision effects? Or what? I just don't see it....OK I followed the link on Jamie's answer. Loss of Precision it is. Point taken. ::shrug:: I'm an experimental scientist. We don't get populations with variations confined to the the 10^-9 level, and we generally use double precision for everything, so those populations we get with variation confined to the 10^-5 level come out OK anyway.

dmckee 2009-05-22 16:23:54

Answer 4

+2 A:

Code snippet:

public static double StandardDeviation(List<double> valueList)
{
    if (valueList.Count < 2) return 0.0;
    double sumOfSquares = 0.0;
    double average = valueList.Average(); //.NET 3.0
    foreach (double value in valueList) 
    {
        sumOfSquares += Math.Pow((value - average), 2);
    }
    return Math.Sqrt(sumOfSquares / (valueList.Count - 1));
}

Demi 2009-05-22 01:44:37

Dividing by Count - 1 or Count depends on whether we're talking about entire population or sample, yes? Looks like OP is talking about a known population but not entirely clear.

John Pirie 2009-05-22 01:55:38

That is correct - this is for sample variance. I appreciate the highlight.

Demi 2009-05-22 01:58:24

Your code crashes for the legitimate case of a list with one value.

SPWorley 2009-05-22 02:39:32

for a sample stddev you shouldn't be passing an list with one item. A sample stddev of one item is worthless.

Demi 2009-05-22 03:23:04

edited in response to comment.

Demi 2009-05-22 03:25:42

Answer 5

+14 A:

While the sum of squares algorithm works fine most of the time, it can cause big trouble if you are dealing with very large numbers. You basically may end up with a negative variance...

Plus, don't never, ever, ever, compute a^2 as pow(a,2), a * a is almost certainly faster.

By far the best way of computing a standard deviation is Welford's method. My C is very rusty, but it could look something like:

public static double StandardDeviation(List<double> valueList)
{
    double M = 0.0;
    double S = 0.0;
    double tmpM = 0.0;
    int k = 1;
    foreach (double value in valueList) 
    {
        double tmpM == M;
        M += (value - tmpM) / k;
        S += (value - tmpM) * (value - M);
        k++;
    }
    return Math.Sqrt(S / (k-1));
}

EDIT: I've updated the code accoridng to Jason's remarks...

Jaime 2009-05-22 11:46:25

+1: I've read Knuth's commentary on this but have never known it was called Welford's method. FYI you can eliminate the k==1 case, it just works.

Jason S 2009-05-22 12:36:10

OH: and you're forgetting the divide-by-N or divide-by-N-1 at the end.

Jason S 2009-05-22 12:57:13

Now look what you've done. You've given me something new to learn. You beast.

dmckee 2009-05-22 16:30:52

John Cook has a good article on standard deviation for large values at http://www.johndcook.com/blog/2008/09/26/comparing-three-methods-of-computing-standard-deviation/, and a followup describing the reasons behind the algorithms at http://www.johndcook.com/blog/2008/09/28/theoretical-explanation-for-numerical-results/.

Emperor XLII 2009-06-07 17:37:19

It actually is John's blog that I'm linking for a description of Welford's method...

Jaime 2009-06-08 15:39:51

ansaurus

tags:

views:

answers:

How do I determine the standard deviation (stddev) of a set of values?

related questions