views:

418

answers:

5

Suppose we have N numbers(integers, floats, whatever you want) and want to find their arithmetic mean. Simplest method is to sum all values and divide by number of values:

def simple_mean(array[N]): # pseudocode
    sum = 0
    for i = 1 to N
       sum += array[i]
    return sum / N

It works fine, but requires big integers. If we don't want big integers and we are fine with rounding errors, and N is the power of two, we can use 'divide-and-conquer' : ((a+b)/2 + (c+d)/2)/2 = (a+b+c+d)/4, ((a+b+c+d)/4 + (e+f+g+h)/4)/2 = (a+b+c+d+e+f+g+h)/8, so on.

def bisection_average(array[N]):
   if N == 1: return array[1]
   return (bisection_average(array[:N/2])+bisection_average(array[N/2:]))/2

Any other ways?

PS. playground for lazy

+3  A: 

If the big integers are problem... is it ok

a/N + b/N+.... n/N

I mean you're looking just for other ways or the optimal way?

Svetlozar Angelov
why?!?! If a,b,etc are integers this will give you an incorrect answer. If they are floating point, I'm not sure but my hunch is you will get more rounding errors than if you just performed a sum and then divided. In either case the computation time is increased greatly for a questionable benefit.
Jason S
+1  A: 

If you use float you might avoid big integers:

def simple_mean(array[N]):
    sum = 0.0 # <---
    for i = 1 to N
       sum += array[i]
    return sum / N
Nick D
+17  A: 

Knuth lists the following method for calculating mean and standard deviation given floating point (original on p. 232 of Vol 2 of The Art of Computer Programming, 1998 edition; my adaptation below avoids special-casing the first iteration):

double M=0, S=0;

for (int i = 0; i < N; ++i)
{
    double Mprev = M;
    M += (x[i] - M)/(i+1);
    S += (x[i] - M)*(x[i] - Mprev);
}

// mean = M
// std dev = sqrt(S/N) or sqrt(S/N+1)
// depending on whether you want population or sample std dev
Jason S
+8  A: 

Here's a way to calculate the mean using just integers without rounding errors and avoiding big intermediate values:

sum = 0
rest = 0
for num in numbers:
  sum += num / N
  rest += num % N
  sum += rest / N
  rest = rest % N

return sum, rest
sepp2k
+1 Very clever!
Martin B
This basically uses multiprecision (dual word) arithmetic. I think there's a way to optimize this to get the number of divide-like (/ or %) operations down, but I can't remember off the top of my head.
Jason S
The usual technique is to calculate X/N and X%N in a single function/single operation. This is because the underlying algorithms are pretty much the same.
MSalters
yes although C doesn't expose them. >:(No, I meant more like: sum += (num + rest) / N; rest = (num + rest) % N;except that may be prone to overflow
Jason S
C doesn't expose div/mod with a single function or operator, but most good compilers will generate the instruction/call if you use both.
Stephen Canon
+2  A: 

If the array is floating-point data, even the "simple" algorithm suffers from rounding error. Interestingly, in that case, blocking the computation into sqrt(N) sums of length sqrt(N) actually reduces the error in the average case (even though the same number of floating-point roundings are performed).

For integer data, note that you don't need general "big integers"; if you have fewer than 4 billion elements in your array (likely), you only need an integer type 32 bits larger than that the type of the array data. Performing addition on this slightly larger type will pretty much always be faster than doing division or modulus on the type itself. For example, on most 32 bit systems, 64-bit addition is faster than 32-bit division/modulus. This effect will only become more exaggerated as the types become larger.

Stephen Canon