ansaurus

Question

Answer 1

A:

I would be very surprised if that was possible using IEEE 754 arithmetics. The point is that IEEE 754 states that intermediate results are infinitely precise and then rounded to the destination data type.

So if you have sum + V[i], that value will always be greater or equal to sum. Rounding down to the next representable number will either produce sum or a number greater than sum.

Of course, there is nothing in the original question that prevents sum from being negative in the first place. In that case the answer would be trivial.

In IEEE 754 arithmetics, there is no number f such that 0 + f = 0 and at the same time f > 0.

Roland Illig 2010-07-15 19:58:44

thank you, I added more information to question, including source code link

aaa 2010-07-15 21:23:22

Answer 2

+2 A:

If V is an array of doubles and sum is a float (or single), you can certainly have values that are > 0 but when added to sum produces 0 if they are smaller than the smallest non-zero denormalized value representable in a float.

How do you know sum is actually zero and not just really really close? Are all bits set to zero?

EDIT: after reading the actual application, the underflow to zero remark is probably referring to repeated rotations around various axes to determine the eigenvalues and eigenvectors of a matrix. In that case, the algorithm only works if you can assume that repeated multiplications of very small numbers will clamp or underflow to zero. The actual sum won't underflow itself, however.

MSN 2010-07-15 20:10:54

Assuming that `sum` already contains `FLT_MIN` and an arbitrarily small value is added to it, how can the correctly rounded result ever become smaller than `sum`? I'm asking because `sum` will always be nearer to the exact result than `0`.

Roland Illig 2010-07-15 20:15:50

@Roland, he doesn't say what sum is initialized to, but I'm going to assume it's 0. And a double can be smaller than FLT_MIN. And converges to zero and being 0 bitwise are two different things. But yes, sum should be monotonically increasing.

MSN 2010-07-15 20:21:10

here http://www.mpi-hd.mpg.de/astrophysik/HEA/internal/Numerical_Recipes/f11-1.pdf, page 460. all types are same

aaa 2010-07-15 21:15:11

Answer 3

+1 A:

What are the types you are using? If f is a float and d1 and d2 are doubles, you get this.

double d1 = std::numeric_limits<double>::min();
double d2 = std::numeric_limits<double>::min();
float f = d1 + d2;
if (f == 0.0) std::cout << "yes";
else std::cout << "no";

This produces "yes".

cape1232 2010-07-15 20:15:22

Man you SO members are fast. How do newbies like me get any reputation when we can't beat anyone to the punch? ;)

cape1232 2010-07-15 20:18:14

@cape1232, we all started out that way. You should see how fast litb responds to esoteric C++ questions. Or Jon Skeet in general.

MSN 2010-07-15 20:21:56

@MSN litb is very helpful indeed. He answered many questions for me, many thanks to him

aaa 2010-07-15 21:24:42

ansaurus

tags:

views:

answers:

converge to zero via underflow

related questions