views:

272

answers:

5

Say you have 100000000 32-bit floating point values in an array, and each of these floats has a value between 0.0 and 1.0. If you tried to sum them all up like this

result = 0.0;
for (i = 0; i < 100000000; i++) {
    result += array[i];
}

you'd run into problems as result gets much larger than 1.0.

So what are some of the ways to more accurately perform the summation?

+1  A: 

Make result a double, assuming C or C++.

Tuomas Pelkonen
Yes, that will help, but what if you have far more than 100000000 values to sum? My choice of 100000000 for this question was arbitrary.
splicer
A: 

If in .NET using the LINQ .Sum() extension method that exists on an IEnumerable. Then it would just be:

var result = array.Sum();
Rob Packwood
Thanks, but I should be more specific: I'm working in C and OpenCL.
splicer
+20  A: 

Sounds like you want to use Kahan Summation.

According to Wikipedia,

The Kahan summation algorithm (also known as compensated summation) significantly reduces the numerical error in the total obtained by adding a sequence of finite precision floating point numbers, compared to the obvious approach. This is done by keeping a separate running compensation (a variable to accumulate small errors).

In pseudocode, the algorithm is:

function kahanSum(input)
 var sum = input[1]
 var c = 0.0          //A running compensation for lost low-order bits.
 for i = 2 to input.length
  y = input[i] - c    //So far, so good: c is zero.
  t = sum + y         //Alas, sum is big, y small, so low-order digits of y are lost.
  c = (t - sum) - y   //(t - sum) recovers the high-order part of y; subtracting y recovers -(low part of y)
  sum = t             //Algebraically, c should always be zero. Beware eagerly optimising compilers!
 next i               //Next time around, the lost low part will be added to y in a fresh attempt.
return sum
Daniel Pryden
+1 for attempting to answer the poster's actual question.
John Fisher
Just what I was looking for! Thanks :)
splicer
Glad I could help.
Daniel Pryden
+1  A: 

If you can tolerate a little extra space (in Java):

float temp = new float[1000000];
float temp2 = new float[1000];
float sum = 0.0f;
for (i=0 ; i<1000000000 ; i++) temp[i/1000] += array[i];
for (i=0 ; i<1000000 ; i++) temp2[i/1000] += temp[i];
for (i=0 ; i<1000 ; i++) sum += temp2[i];

Standard divide-and-conquer algorithm, basically. This only works if the numbers are randomly scattered; it won't work if the first half billion numbers are 1e-12 and the second half billion are much larger.

But before doing any of that, one might just accumulate the result in a double. That'll help a lot.

Rex Kerr
A: 

The absolutely optimal way is to use a priority queue, in the following way:

PriorityQueue<Float> q = new PriorityQueue<Float>();
for(float x : list) q.add(x);
while(q.size() > 1) q.add(q.pop() + q.pop());
return q.pop();

(this code assumes the numbers are positive; generally the queue should be ordered by absolute value)

Explanation: given a list of numbers, to add them up as precisely as possible you should strive to make the numbers close, t.i. eliminate the difference between small and big ones. That's why you want to add up the two smallest numbers, thus increasing the minimal value of the list, decreasing the difference between the minimum and maximum in the list and reducing the problem size by 1.

Unfortunately I have no idea about how this can be vectorized, considering that you're using OpenCL. But I am almost sure that it can be. You might take a look at the book on vector algorithms, it is surprising how powerful they actually are: Vector Models for Data-Parallel Computing

jkff
Actually this not an optimal solution. You want to minimize the absolute value of the intermediate results, which does not necessarily mean that you should always add smallest numbers first. For example, if you want to sum [1.01, -0.001, -1.02, 0.0012], it is best to express it as (0.0012 - 0.001) + (1.01 - 1.02).
quant_dev