In short: how can I execute a+b
such that any loss-of-precision due to truncation is away from zero rather than toward zero?
The Long Story
I'm computing the sum of a long series of floating point values for the purpose of computing the sample mean and variance of the set. Since Var(X) = E(X2) - E(X)2, it suffices to maintain running count of all numbers, the sum of all numbers so far, and the sum of the squares of all numbers so far.
So far so good.
However, it's absolutely required that E(X2) > E(X)2, which due to floating point accuracy isn't always the case. In pseudo-code, the problem is this:
int count;
double sum, sumOfSquares;
...
double value = <current-value>;
double sqrVal = value*value;
count++;
sum += value; //slightly rounded down since value is truncated to fit into sum
sumOfSquares += sqrVal; //rounded down MORE since the order-of-magnitude
//difference between sqrVal and sumOfSquares is twice that between value and sum;
For variable sequences, this isn't a big issue - you end up slightly under-estimating the variance, but it's often not a big issue. However, for constant or almost-constant sets with a non-zero mean, it can mean that E(X2) < E(X)2, resulting in a negative computed variance, which violates expectations of consuming code.
Now, I know about Kahan Summation, which isn't an attractive solution. Firstly, it makes the code susceptible to optimization vagaries (depending on optimization flags, code may or may not exhibit this problem), and secondly, the problem isn't really due to the precision - which is good enough - it's because addition introduces systematic error towards zero. If I could execute the line
sumOfSquares += sqrVal;
in such a way as to ensure that sqrVal is rounded up, not down, into the precision of sumOfSquares, I'd have a numerically reasonable solution. But how can I achieve that?
Edit: Finished question - why does pressing enter in the drop-down-list in the tag field submit the question anyhow?