views:

104

answers:

1

The source data :

    static double[] felix = new double[] { 0.003027523, 0.002012256, -0.001369238, -0.001737660, -0.001647287, 
        0.000275154, 0.002017238, 0.001372621, 0.000274148, -0.000913576, 0.001920263, 0.001186456, -0.000364631, 
        0.000638337, 0.000182266, -0.001275626, -0.000821093, 0.001186998, -0.000455996, -0.000547445, -0.000182582,
        -0.000547845, 0.001279006, 0.000456204, 0.000000000, -0.001550388, 0.001552795, 0.000729594, -0.000455664, 
        -0.002188184, 0.000639620, 0.000091316, 0.001552228, -0.001002826, 0.000182515, -0.000091241, -0.000821243,
        -0.002009132, 0.000000000, 0.000823572, 0.001920088, -0.001368863, 0.000000000, 0.002101800, 0.001094291, 
        0.001639643, 0.002637323, 0.000000000, -0.000172336, -0.000462665, -0.000136141 };

The variance function:

    public static double Variance(double[] x)
    {
        if (x.Length == 0)
            return 0;
        double sumX = 0;
        double sumXsquared = 0;
        double varianceX = 0;
        int dataLength = x.Length;


        for (int i = 0; i < dataLength; i++)
        {
            sumX += x[i];
            sumXsquared += x[i] * x[i];
        }

        varianceX = (sumXsquared / dataLength) - ((sumX / dataLength) * (sumX / dataLength));
        return varianceX;
    }

Excel and some online calculator says the variance is 1.56562E-06 While my function gives me 1.53492394804015E-06. I begin to doubt if the C# has accuracy problem or what. Is there anyone have this kind of problem before?

+9  A: 

What you are seeing is the difference between sample variance and population variance and nothing to do with floating point precision.

You are calculating population variance. Excel and that web site are calculating sample variance.

Var and VarP are distinct calculations and you do need to be careful about which one you are using. (unfortunately people often refer to them as if they are interchangeable when they are not. The same is true for standard deviation)

Sample variance for your data is 1.56562E-06, population variance is 1.53492394804015E-06.

From some code posted on codeproject awhile back:

Variance in a sample

public static double Variance(this IEnumerable<double> source)
{
    double avg = source.Average();
    double d = source.Aggregate(0.0, (total, next) => total += Math.Pow(next - avg, 2));
    return d / (source.Count() - 1);
}

Variance in a population

public static double VarianceP(this IEnumerable<double> source)
{
    double avg = source.Average();
    double d = source.Aggregate(0.0, (total, next) => total += Math.Pow(next - avg, 2));
    return d / source.Count();
}
dkackman
Nice answer! ! !
Richard Morgan
well thank you!
dkackman