I need to calculate averages, standard deviations, medians etc for a bunch of numerical data. Is there a good open source .NET library I can use? I have found NMath but it is not free and may be overkill for my needs.
There only appears to be commercial providers for the .NET space (and none of them are cheap).
There are open source C libraries, but that doesn't help you.
You might have to roll your own, or port one of the Open Source libraries to .NET (at least the features you care about)
I found this on the CodeProject website. It looks like a good C# class for handling most of the basic statistical functions.
I decided it was quicker to write my own, that just did what I needed. Here's the code...
/// <summary>
/// Very basic statistical analysis routines
/// </summary>
public class Statistics
{
List<double> numbers;
public double Sum { get; private set; }
public double Min { get; private set; }
public double Max { get; private set; }
double sumOfSquares;
public Statistics()
{
numbers = new List<double>();
}
public int Count
{
get { return numbers.Count; }
}
public void Add(double number)
{
if(Count == 0)
{
Min = Max = number;
}
numbers.Add(number);
Sum += number;
sumOfSquares += number * number;
Min = Math.Min(Min,number);
Max = Math.Max(Max,number);
}
public double Average
{
get { return Sum / Count; }
}
public double StandardDeviation
{
get { return Math.Sqrt(sumOfSquares / Count - (Average * Average)); }
}
/// <summary>
/// A simplistic implementation of Median
/// Returns the middle number if there is an odd number of elements (correct)
/// Returns the number after the midpoint if there is an even number of elements
/// Sorts the list on every call, so should be optimised for performance if planning
/// to call lots of times
/// </summary>
public double Median
{
get
{
if (numbers.Count == 0)
throw new InvalidOperationException("Can't calculate the median with no data");
numbers.Sort();
int middleIndex = (Count) / 2;
return numbers[middleIndex];
}
}
}
You have to be careful. There are several ways to compute standard deviation that would give the same answer if floating point arithmetic were perfect. They're all accurate for some data sets, but some are far better than others under some circumstances.
The method I've seen proposed here is the one that is most likely to give bad answers. I used it myself until it crashed on me.
See Comparing three methods of computing standard deviation.
AForge.NET has AForge.Math namespace, providing some basic statistics functions: Histogram, mean, median, stddev, entropy.