tags:

views:

920

answers:

7

Hi folks,

I was wondering if it's possible to calculate the average of some numbers if I have this:

int currentCount = 12;
float currentScore = 6.1123   (this is a range of 1 <-> 10).

Now, if I receive another score (let's say 4.5), can I recalculate the average so it would be something like:

int currentCount now equals 13
float currentScore now equals ?????

or is this impossible and I still need to remember the list of scores?

+16  A: 

I like to store the sum and the count. It avoids an extra multiply each time.

current_sum += input;
current_count++;
current_average = current_sum/current_count;
John the Statistician
That's a good point about maintaining the sum, especially if you can defer the averaging until you summed ALL the numbers.
paxdiablo
But we shouldn't turn this into a mutual admiration society :-)
paxdiablo
Exactly; in general you can calculate the nth moment with sums of powers. For example, you can calculate the std.dev. with sums of squares, sums, and count. However, if you need a streaming std. dev. don't do that, do this: http://www.cs.berkeley.edu/~mhoemmen/cs194-fall2007/Tutorials/variance.pdf
John the Statistician
Wow, you really ARE a statistician, +1.
paxdiablo
And you have great hair! Oh sorry. We decided not to have a mutual admiration society...
slim
+2  A: 

You can store currentCount and sumScore and you calculate sumScore/currentCount.

rics
+18  A: 

The following formulas allow you to track averages just from stored average and count, as you requested.

currentScore = (currentScore * currentCount + newValue) / (currentCount + 1)
currentCount = currentCount + 1

This relies on the fact that your average is currently your sum divided by the count. So you simply multiply count by average to get the sum, add your new value and divide by (count+1), then increase count.

So, let's say you have the data {7,9,11,1,12} and the only thing you're keeping is the average and count. As each number is added, you get:

+--------+-------+----------------------+----------------------+
| Number | Count |   Actual average     | Calculated average   |
+--------+-------+----------------------+----------------------+
|      7 |     1 | (7)/1           =  7 | (0 * 0 +  7) / 1 = 7 |
|      9 |     2 | (7+9)/2         =  8 | (7 * 1 +  9) / 2 = 8 |
|     11 |     3 | (7+9+11)/3      =  9 | (8 * 2 + 11) / 3 = 9 |
|      1 |     4 | (7+9+11+1)/4    =  7 | (9 * 3 +  1) / 4 = 7 |
|     12 |     5 | (7+9+11+1+12)/5 =  8 | (7 * 4 + 12) / 5 = 8 |
+--------+-------+----------------------+----------------------+
paxdiablo
Shouldn't that be currentScore = (currentScore * currentCount + 4.5) / (currentCount + 1)
John the Statistician
Thanks, @John, wasn't thinking straight.
paxdiablo
No worries, it's an easy mistake to make
John the Statistician
Perfect! i confirmed this using google docs with 16 or so numbers, then added a 17 number and got the average. double checked with the formula above and it's identical .. all the way down to the 12 or so decimal place. Awesomesauce! (i didn't think it was going to be possible or that simple) :P
Pure.Krome
Awesomesauce ??? That's sound like my 4yo (easy peasy, lemon squeezy) :-)
paxdiablo
I worry about the accumulation of rounding errors. Storing the count and the sum, instead of the running average, reduces this (but does not eliminate it).
slim
Pax - what type of DB field should i use?
Pure.Krome
@Pure, DB as in database? I'm not sure what you're asking here.
paxdiablo
@slim, you're right about rounding but, unless you're averaging absolutely huge quantities of numbers, the relative error usually stays pretty low.
paxdiablo
@ Pax: I'm assuming the currentScore is of type FLOAT. As such, should i also store this field in the DB as a DB Type FLOAT or a DB Type of DECIMAL(10,2) or something?
Pure.Krome
@PureK, I would just use the FLOAT type. I come from a school of thought that chooses simplicity as long as it meets the requirements. Standard float fields in any database should be enough to store the information.
paxdiablo
Very awesomesauce. I didn't think this was possible when I was trying to do it oh so many years ago, but I never really thought about it... very cool.
Mark
@paxdiablo: In good circumstances, the rounding errors will be fairly even and balance each other out. However, there are lots of things that could lead to bad circumstances, like the data being input from a much less precise source. In that case, the inputs can be systematically biased. Note that keeping the sum around as opposed to the present average also means each step goes from "multiply, add, divide" to "add, divide".
Novelocrat
+1  A: 

float currentScore now equals (currentScore * (currentCount-1) + 4.5)/currentCount ?

Android
+3  A: 

It's quite easy really, when you look at the formula for the average: A1 + A2 + ... + AN/N. Now, If you have the old average and the N (numbers count) you can easily calculate the new average:

newScore = (currentScore * currentCount + someNewValue)/(currentCount + 1)
Kasprzol
+2  A: 

or... if you want to be silly, you can do it in one line :

 current_average = (current_sum = current_sum + newValue) / ++current_count;

:)

Alterlife
Does it make any difference if i do ++current_count vs current_count++ ??
Pure.Krome
Yes. One increments current_count before calculating current_average, the other does it after.
slim
so ++current_count increments BEFORE the division is done, wile current_count++ is incremented AFTER the division? ouch! just by looking at the code, i would have thought it would have been... "and divide the left side by the right side which is (current count + 1). Glad i asked!
Pure.Krome
That is true in C or C++. But not in Java. The answer will vary by language!
sharth
A: 

This is a very good source for basic math and algebra:

http://www.purplemath.com/modules/index.htm

BobbyShaftoe