views:

185

answers:

3

Does LINQ model the aggregate SQL function STDDEV() (standard deviation)?

If not, what is the simplest / best-practices way to calculate it?

Example:

  SELECT test_id, AVERAGE(result) avg, STDDEV(result) std 
    FROM tests
GROUP BY test_id
+5  A: 

You can make your own extension calculating it

public static class Extensions
{
    public static double StdDev(this IEnumerable<double> values)
    {
       double ret = 0;

       if (values.Count() > 1)
       {
          //Compute the Average
          double avg = values.Average();

          //Perform the Sum of (value-avg)^2
          double sum = values.Sum(d => (d - avg) * (d - avg));

          //Put it all together
          ret = Math.Sqrt((sum) / values.Count()-1);
       }
       return ret;
    }
}

Transformed into extension from Adding Standard Deviation to LINQ by Chris Bennett.

Dynami Le Savard
I'd make that test "values.Count() > 1", because if it's exactly 1 you'll have a divide by zero error when you calculate the return value.
duffymo
Math.pow(d-avg, 2)? I'd skip the function call and use (d-avg)*(d-avg)
duffymo
A: 

This is a comment for Dynami's answer. There is a math error in the final calculation. There should be brackets around "values.Count()-1". When "(sum) / values.Count()" is less than zero, subtracting one from it results in a negative number and I think we all know what happens when you try to take the square root of a negative number. Line should read

      //Put it all together
      ret = Math.Sqrt(sum / (values.Count()-1));
David Clarke
A: 

Dynami's answer (with fix for math) works but makes multiple passes through the data to get a result. This is a single pass method that provides the correct output:

public static double StdDev(this IEnumerable<double> values)
{
    // ref: http://warrenseen.com/blog/2006/03/13/how-to-calculate-standard-deviation/
    double mean = 0.0;
    double sum = 0.0;
    double stdDev = 0.0;
    foreach (double val in values)
    {
        n++;
        double delta = val - mean;
        mean += delta / n;
        sum += delta * (val - mean);
    }
    if (1 < n)
        stdDev = Math.Sqrt(sum / (n - 1));

    return stdDev;
}
David Clarke
You may not have iterated the entire sequence more than once, but your method will still make two calls to GetEnumerator (which could be triggering a complex SQL query). Why not skip the condition and check n at the end of the loop?
Gideon Engelberth
Thanks Gideon, removes a level of nesting too. You're correct about the SQL, it's not relevant to what I'm working on so I hadn't considered the implication.
David Clarke