views:

1164

answers:

2

Could you please suggest any simple Java statistics packages?

I don't necessarily need any of the advanced stuff. I was quite surprised that there does not appear to be a function to calculate the Mean in the java.lang.Math package...

What are you guys using for this?


EDIT

Regarding:

How hard is it to write a simple class that calculates means and standard deviations?

Well, not hard. I only asked this question after having hand-coded these. But it only added to my Java frustration not to have these simplest functions available at hand when I needed them. I don't remember the formula for calculating stdev by heart :)

+7  A: 

Apache Commons Math, specifically DescriptiveStatistics and SummaryStatistics.

John Paulett
thanks. I take it there isn't really a reason for looking any further. You're satisfied with Apache Commons, or is it just so-so, good-enough, could-be-better?
Peter Perháč
Just discovered this library, precisely for calculating mean, standard deviation. Very easy to pick up. +1
Grundlefleck
I've found it to fit my needs well. While I've never personally run into this issue or cared, I have a coworker who found it to be slower when computing the mean of an array than just doing a loop to add the values then dividing by the size of the array. However, his code was averaging things that would likely never cause integer overflow errors. I assume that Commons Math is a little smarter and won't let integers overflow.
John Paulett
The APIs use `double` not `int` or `long` so integer overflow is not an issue. However, they cannot handle value sets with more than `Integer.MAX_VALUE` doubles.
Stephen C
You should not have to store an array to calculate a mean or standard deviation. It's easy to do both without having to take up all that memory.
duffymo
How hard is it to write a simple class that calculates means and standard deviations? Must there be a library for everything?
duffymo
@duffymo, the original data was in an array, so it was just keeping a running total and then dividing by the size of the array.
John Paulett
Yes, I realize that. All I'm saying is that an array isn't necessary. It's not even desirable if you're trying to minimize the amount of memory you consume.
duffymo
@duffymo As a classic Java programmer, I am definitely not concerned by stuff like how much memory do my programs consume. (<-- joking, of course)As to `How hard is it to write a simple class that calculates means and standard deviations?` well, not hard. I only asked this question *after* having hand-coded these. But it only added to my Java frustration not to have these simplest functions available at my hand when I needed them. I don't remember the formula for calculating stdev by heart :)
Peter Perháč
@duffymo - my reading of the Apache library APIs is that they require you to pass the values to be averaged in an array.
Stephen C
@MasterPeter - but I'm sure you remember the URL for Wikipedia by heart :-) :-)
Stephen C
@duffymo, while I also would have found it easy to write the functions I used in commons-math, stupid mistakes can be made by anyone, at any time. Sometimes I prefer not to leave it to chance. Also, in some cases it's preferable to up the memory footprint in exchange for a tested solution. All depends on the situation I guess...
Grundlefleck
@Grundlefleck - I agree that everyone makes stupid mistakes, and I realize the value of libraries, but a simple mean and standard deviation calculator are low on the risk scale. It's easy to write, easy to test, and put aside. There's an argument that says minimizing dependencies is a good idea, too. Why add another library to your app when it's so easy to roll your own?
duffymo
@Stephen C - agreed. I'm saying that's fine when you have a reasonable number of values, but as the array size grows you'll have a problem storing them. What do you do in the case of a runtime app that you want to keep a running tab on mean and standard deviation of values as they arrive? Your array won't be very useful in that situation.
duffymo
+5  A: 

Just responding to this part of the question:

I was quite surprised that there does not appear to be a function to calculate the Mean in the java.lang.Math package...

I don't think I was surprised to find this. There are a lot of "useful algorithms" that the Java class libraries do not implement. They do not implement everything. And in this, they are no different from other programming languages.

Actually It would be a bad thing if Sun did try to implement too much in J2SE:

  1. It would take more designer / developer / technical documenter time ... with no clear "return on investment".

  2. It would increase the Java footprint; e.g. the size of "rt.jar". (Or if they tried to mitigate that, it would result in more platform complexity ... )

  3. For things in the mathematical space, you often need to implement the algorithms in different ways (with different APIs) to cater for different requirements.

  4. For complex things, it may be better for Sun not to try to "standardise" the APIs, but leave it to some other interested / skilled group to do it; e.g. the Apache folks.

Stephen C
+1 from me - well said.
duffymo