tags:

views:

71

answers:

2

OK, so I know you're generally not supposed to compare two floating-point numbers for equality. However, in William Kahan's How Futile are Mindless Assessments of Roundoff in Floating-Point Computation? he shows the following code (pseudo-code, I believe):

Real Function T(Real z) :
      T := exp(z) ;                       ... rounded, of course.
      If (T = 1) Return( T ) ;            ... when |z| is very tiny.
      If (T = 0) Return( T := –1/z ) ;    ... when exp(z) underflows.
      Return( T := ( T – 1 )/log(T) ) ;   ... in all other cases.
      End T .

Now, I'm interested in implementing this in C or C++, and I have two related questions:

a) if I take T to be a double, then in the comparison (T == 1) or (T == 0) would 0 and 1 get converted to double to preserve the precision of the values involved in a multi-type expression?

b) does this still count as comparing two floating-point numbers for equality?

+3  A: 

Yes and yes.

For 32-bit ints, double can represent every value precisely. When you compare a double to a 64-bit int, however, there will be potential roundoff error if the int is greater than 2^52. You can use long double, though, which has at least 64 bits of mantissa.

Of course, the best way is just to use a floating-point literal: 1.0 or just 1. has type double, 1.0f is a float, and my_float_type(1) has whatever type it's supposed to :v) .

Potatoswatter
+1  A: 

The integer gets converted to a double.

See at the beginning of section 5 Expressions in the C++ standard.

If you know that floating point numbers contain exact values, then you don't need to worry about inexact representations.

Unsigned integers can be represented exactly as floating point numbers as long as they fit into the mantissa + 1 bit, for signed integers it is mantissa + 2 bits (except for the most negative integer, 2-31 for 32 bit ints).

Fractions with a power of 2 in the denominator can also be represented exactly.

starblue