views:

1749

answers:

16

What is the best method for comparing IEEE floats and doubles for equality? I have heard of several methods, but I wanted to see what the community thought.

+6  A: 

The best approach I think is to compare ULPs.

bool is_nan(float f)
{
    return (*reinterpret_cast<unsigned __int32*>(&f) & 0x7f800000) == 0x7f800000 && (*reinterpret_cast<unsigned __int32*>(&f) & 0x007fffff) != 0;
}

bool is_finite(float f)
{
    return (*reinterpret_cast<unsigned __int32*>(&f) & 0x7f800000) != 0x7f800000;
}

// if this symbol is defined, NaNs are never equal to anything (as is normal in IEEE floating point)
// if this symbol is not defined, NaNs are hugely different from regular numbers, but might be equal to each other
#define UNEQUAL_NANS 1
// if this symbol is defined, infinites are never equal to finite numbers (as they're unimaginably greater)
// if this symbol is not defined, infinities are 1 ULP away from +/- FLT_MAX
#define INFINITE_INFINITIES 1

// test whether two IEEE floats are within a specified number of representable values of each other
// This depends on the fact that IEEE floats are properly ordered when treated as signed magnitude integers
bool equal_float(float lhs, float rhs, unsigned __int32 max_ulp_difference)
{
#ifdef UNEQUAL_NANS
    if(is_nan(lhs) || is_nan(rhs))
    {
     return false;
    }
#endif
#ifdef INFINITE_INFINITIES
    if((is_finite(lhs) && !is_finite(rhs)) || (!is_finite(lhs) && is_finite(rhs)))
    {
     return false;
    }
#endif
    signed __int32 left(*reinterpret_cast<signed __int32*>(&lhs));
    // transform signed magnitude ints into 2s complement signed ints
    if(left < 0)
    {
     left = 0x80000000 - left;
    }
    signed __int32 right(*reinterpret_cast<signed __int32*>(&rhs));
    // transform signed magnitude ints into 2s complement signed ints
    if(right < 0)
    {
     right = 0x80000000 - right;
    }
    if(static_cast<unsigned __int32>(std::abs(left - right)) <= max_ulp_difference)
    {
     return true;
    }
    return false;
}

A similar technique can be used for doubles. The trick is to convert the floats so that they're ordered (as if integers) and then just see how different they are.

I have no idea why this damn thing is screwing up my underscores. Edit: Oh, perhaps that is just an artefact of the preview. That's OK then.

DrPizza
This wins hands-down for precision. But for performance... you'd have to trade off a bit of that accuracy for a speed improvement. Agreed?
OJ
If you need doubles or portability: I found a great cross-platform implementation of this that can deal with both doubles and floats in Google Test and posted it here: http://stackoverflow.com/questions/17333/most-effective-way-for-float-and-double-comparison/3423299#3423299
skrebbel
A: 

Oh dear lord please don't interpret the float bits as ints unless you're running on a P6 or earlier.

MSN

Mat Noguchi
A: 

Oh dear lord please don't interpret the float bits as ints unless you're running on a P6 or earlier.

Even if it causes it to copy from vector registers to integer registers via memory, and even if it stalls the pipeline, it's the best way to do it that I've come across, insofar as it provides the most robust comparisons even in the face of floating point errors.

i.e. it is a price worth paying.

DrPizza
+3  A: 

The current version I am using is this

bool is_equals(float A, float B,
               float maxRelativeError, float maxAbsoluteError)
{

  if (fabs(A - B) < maxAbsoluteError)
    return true;

  float relativeError;
  if (fabs(B) > fabs(A))
    relativeError = fabs((A - B) / B);
  else
    relativeError = fabs((A - B) / A);

  if (relativeError <= maxRelativeError)
    return true;

  return false;
}

This seems to take care of most problems by combining relative and absolute error tolerance. Is the ULP approach better? If so, why?

Craig H
A: 

it's the best way to do it that I've come across, insofar as it provides the most robust comparisons even in the face of floating point errors.

If you have floating point errors you have even more problems than this. Although I guess that is up to personal perspective.

MSN

Mat Noguchi
You will ALWAYS have floating point errors.
BCS
A: 

This seems to take care of most problems by combining relative and absolute error tolerance. Is the ULP approach better? If so, why?

ULPs are a direct measure of the "distance" between two floating point numbers. This means that they don't require you to conjure up the relative and absolute error values, nor do you have to make sure to get those values "about right". With ULPs, you can express directly how close you want the numbers to be, and the same threshold works just as well for small values as for large ones.

DrPizza
A: 

If you have floating point errors you have even more problems than this. Although I guess that is up to personal perspective.

Even if we do the numeric analysis to minimize accumulation of error, we can't eliminate it and we can be left with results that ought to be identical (if we were calculating with reals) but differ (because we cannot calculate with reals).

DrPizza
A: 

If you are looking for two floats to be equal, then they should be identically equal in my opinion. If you are facing a floating point rounding problem, perhaps a fixed point representation would suit your problem better.

Nick
dealing with (almost) equality is a common cases in loops where you want to stop when things get "close enough" but where you don't expect them to ever converge exactly.
BCS
A: 

If you are looking for two floats to be equal, then they should be identically equal in my opinion. If you are facing a floating point rounding problem, perhaps a fixed point representation would suit your problem better.

Perhaps we cannot afford the loss of range or performance that such an approach would inflict.

DrPizza
+2  A: 

@DrPizza

Even if we do the numeric analysis to minimize accumulation of error, we can't eliminate it and we can be left with results that ought to be identical (if we were calculating with reals) but differ (because we cannot calculate with reals).

Ah, yes. In terms of precision, your solution is clearly better.

MSN

Mat Noguchi
A: 

If you are looking for two floats to be equal, then they should be identically equal in my opinion. If you are facing a floating point rounding problem, perhaps a fixed point representation would suit your problem better.

Perhaps I should explain the problem better. In C++, the following code:

#include <iostream>

using namespace std;


int main()
{
  float a = 1.0;
  float b = 0.0;

  for(int i=0;i<10;++i)
  {
    b+=0.1;
  }

  if(a != b)
  {
    cout << "Something is wrong" << endl;
  }

  return 1;
}

prints the phrase "Something is wrong". Are you saying that it should?

Craig H
You will almost always get a != b because of floating point accuracy and rounding errors.
Jim Kramer
A: 

@DrPizza: I am no performance guru but I would expect fixed point operations to be quicker than floating point operations (in most cases).

@Craig H: Sure. I'm totally okay with it printing that. If a or b store money then they should be represented in fixed point. I'm struggling to think of a real world example where such logic ought to be allied to floats. Things suitable for floats:

  • weights
  • ranks
  • distances
  • real world values (like from a ADC)

For all these things, either you much then numbers and simply present the results to the user for human interpretation, or you make a comparative statement (even if such a statement is, "this thing is within 0.001 of this other thing"). A comparative statement like mine is only useful in the context of the algorithm: the "within 0.001" part depends on what physical question you're asking. That my 0.02. Or should I say 2/100ths?

Nick
+1  A: 

@DrPizza: I am no performance guru but I would expect fixed point operations to be quicker than floating point operations (in most cases).

It rather depends on what you are doing with them. A fixed-point type with the same range as an IEEE float would be many many times slower (and many times larger).

Things suitable for floats:

3D graphics, physics/engineering, simulation, climate simulation....

DrPizza
A: 

It rather depends on what you are doing with them. A fixed-point type with the same range as an IEEE float would be many many times slower (and many times larger).

Okay, but if I want a infinitesimally small bit-resolution then it's back to my original point: == and != have no meaning in the context of such a problem.

An int lets me express ~10^9 values (regardless of the range) which seems like enough for any situation where I would care about two of them being equal. And if that's not enough, use a 64-bit OS and you've got about 10^19 distinct values.

I can express values a range of 0 to 10^200 (for example) in an int, it is just the bit-resolution that suffers (resolution would be greater than 1, but, again, no application has that sort of range as well as that sort of resolution).

To summarize, I think in all cases one either is representing a continuum of values, in which case != and == are irrelevant, or one is representing a fixed set of values, which can be mapped to an int (or a another fixed-precision type).

Nick
A: 

Checkout this answer to a similar question.

grom
A: 

An int lets me express ~10^9 values (regardless of the range) which seems like enough for any situation where I would care about two of them being equal. And if that's not enough, use a 64-bit OS and you've got about 10^19 distinct values.

I have actually hit that limit... I was trying to juggle times in ps and time in clock cycles in a simulation where you easily hit 10^10 cycles. No matter what I did I very quickly overflowed the puny range of 64-bit integers... 10^19 is not as much as you think it is, gimme 128 bits computing now!

Floats allowed me to get a solution to the mathematical issues, as the values overflowed with lots zeros at the low end. So you basically had a decimal point floating aronud in the number with no loss of precision (I could like with the more limited distinct number of values allowed in the mantissa of a float compared to a 64-bit int, but desperately needed th range!).

And then things converted back to integers to compare etc.

Annoying, and in the end I scrapped the entire attempt and just relied on floats and < and > to get the work done. Not perfect, but works for the use case envisioned.

jakobengblom2