8561

20
+18  Q:

## Most effective way for float and double comparison

What would be the most efficient way to compare two doubles or two floats (single precision)?

Simply doing this is not correct:

``````bool CompareDoubles1 (double A, double B)
{
return A == B;
}
``````

But something like:

``````bool CompareDoubles2 (double A, double B)
{
diff = A - B;
return (diff < EPSILON) && (-diff > EPSILON);
}
``````

Seems to waste processing.

Does anyone knows a smarter float comparer?

+28  A:

The comparison with an epsilon value is what most people do (even in game programming).

You should change your implementation a little though:

``````bool AreSame(double a, double b)
{
return fabs(a - b) < EPSILON;
}
``````

Cheers!

Edit: Christer has added a stack of great info on this topic on a recent blog post. Enjoy.

This has been marked as accepted, then reverted, multiple times! Wtf? :)
@OJ: is there something wrong with the first code sample? I thought the only problem was in a situation like this : `float a = 3.4; if(a == 3.4){...}` i.e when you are comparing a stored floating point with a literal | In this case, both numbers are stored, so they will have the same representation, if equal, so what is the harm in doing `a == b`?
@Lazer: http://www.cygnus-software.com/papers/comparingfloats/comparingfloats.htm.
A:

would it be more efficient to add

``````if (d1 == d2) return true;
``````

in the beginning of the function ?

note: this is for the case inwhich the bits are identical ...
+2  A:

@ebel gil

Comparing floating point numbers using == is not reliable. That's why Alex made the function.

+7  A:

The code you wrote is bugged :

``````return (diff < EPSILON) && (-diff > EPSILON);
``````

The correct code would be :

``````return (diff < EPSILON) && (diff > -EPSILON);
``````

(...and yes this is different)

I wonder if fabs wouldn't make you lose lazy evaluation in some case. I would say it depends on the compiler. You might want to try both. If they are equivalent in average, take the implementation with fabs.

If you have some info on which of the two float is more likely to be bigger than then other, you can play on the order of the comparison to take better advantage of the lazy evaluation.

Finally you might get better result by inlining this function. Not likely to improve much though...

Edit: OJ, thanks for correcting your code. I erased my comment accordingly

A:

@OJ

The tags specify C++

A:

to John:

Yes, but as == compares the bits one by one. assuming lots of times the two doubles are equal by the bits (as they were computed on same machine doing the same math), it might save time to check for absolute (bitwise) equality before computing for diff ?

+2  A:

@Ebel

Doing an absolute comparison first is largely a waste of time. For one thing unless your numbers were generated from exactly the same sources, using exactly the same operations, and in exactly the same order, then they are unlikely to be absolutely equal regardless of machine architecture.

For another branching on many chips, especially PowerPC chips as found in the 360/PS3, can be more expensive than doing the actual subtraction / abs.

A:

would it be more efficient to add ... in the beginning of the function?

`<invoke Knuth>`Premature optimization is the root of all evil.`</invoke Knuth>` Just go with abs(a-b) < EPS as noted above, it's clear and easy to understand.

Don't forget how that Knuth quote continues: *Yet we should not pass up our opportunities in that critical 3%. A good programmer will not be lulled into complacency by such reasoning, he will be wise to look carefully at the critical code; but only after that code has been identified.*
It looks like you skipped the last clause of your very quote -- "only after that [critical] code has been identified." Unless this code is identified as a bottleneck, optimizing it beyond clarity is a waste of time (and potentially harmful).
+15  A:

For a more in depth approach read Comparing floating point numbers. Here is the code snippet from that link:

``````// Usable AlmostEqual function
bool AlmostEqual2sComplement(float A, float B, int maxUlps)
{
// Make sure maxUlps is non-negative and small enough that the
// default NAN won't compare as equal to anything.
assert(maxUlps > 0 && maxUlps < 4 * 1024 * 1024);
int aInt = *(int*)&A;
// Make aInt lexicographically ordered as a twos-complement int
if (aInt < 0)
aInt = 0x80000000 - aInt;
// Make bInt lexicographically ordered as a twos-complement int
int bInt = *(int*)&B;
if (bInt < 0)
bInt = 0x80000000 - bInt;
int intDiff = abs(aInt - bInt);
if (intDiff <= maxUlps)
return true;
return false;
}
``````
A:

@Andrew

Good points !

A:

It depends on how precise you want the comparison to be. If you want to compare for exactly the same number, then just go with ==. (You almost never want to do this unless you actually want exactly the same number.) On any decent platform you can also do the following:

`diff= a - b; return fabs(diff)<EPSILON;`

as `fabs` tends to be pretty fast. By pretty fast I mean it is basically a bitwise AND, so it better be fast.

And integer tricks for comparing doubles and floats are nice but tend to make it more difficult for the various CPU pipelines to handle effectively. And it's definitely not faster on certain in-order architectures these days due to using the stack as a temporary storage area for values that are being used frequently. (Load-hit-store for those who care.)

MSN

+1  A:

General-purpose comparison of floating-point numbers is generally meaningless. How to compare really depends on a problem at hand. In many problems, numbers are sufficiently discretized to allow comparing them within a given tolerance. Unfortunately, there are just as many problems, where such trick doesn't really work. For one example, consider working with a Heaviside (step) function of a number in question (digital stock options come to mind) when your observations are very close to the barrier. Performing tolerance-based comparison wouldn't do much good, as it would effectively shift the issue from the original barrier to two new ones. Again, there is no general-purpose solution for such problems and the particular solution might require going as far as changing the numerical method in order to achieve stability.

+6  A:

`return fabs(a - b) < EPSILON;

This is fine if:

• the order of magnitude of your inputs don't change much
• very small numbers of opposite signs can be treated as equal

But otherwise it'll lead you into trouble. Double precision numbers have a resolution of about 16 decimal places. If the two numbers you are comparing are larger in magnitude than EPSILON*1.0E16, then you might as well be saying:

``````return a==b;
``````

I'll examine a different approach that assumes you need to worry about the first issue and assume the second is fine your application. A solution would be something like:

``````#define VERYSMALL  (1.0E-150)
#define EPSILON    (1.0E-8)
bool AreSame(double a, double b)
{
double absDiff = fabs(a - b);
if (absDiff < VERYSMALL)
{
return true;
}

double maxAbs  = max(fabs(a) - fabs(b));
return (absDiff/maxAbs) < EPSILON;
}
``````

This is expensive computationally, but it is sometimes what is called for. This is what we have to do at my company because we deal with an engineering library and inputs can vary by a few dozen orders of magnitude.

Anyway, the point is this (and applies to practically every programming problem): Evaluate what your needs are, then come up with a solution to address your needs -- don't assume the easy answer will address your needs. If after your evaluation you find that `fabs(a-b) < EPSILON` will suffice, perfect -- use it! But be aware of its shortcomings and other possible solutions too.

Aside from the typos (s/-/,/ missing comma in fmax()), this implementation has a bug for numbers near zero that are within EPSILON, but not quite VERYSMALL yet. E.g., AreSame(1.0E-10, 1.0E-9) reports false because the relative error is huge.You get to be the hero at your company.
+9  A:

The portable way to get epsilon in C++ is

``````#include <limits>
std::numeric_limits<double>::epsilon()
``````

Then the comparison function becomes

``````#include <cmath>
#include <limits>

bool AreSame(double a, double b) {
return std::fabs(a - b) < std::numeric_limits<double>::epsilon();
}
``````
write `std::fabs` instead of `fabs` if you want to stay consistant.
You'll want a multiple of that epsilon most likely.
Can't you just use std::abs? AFAIK, std::abs is overloaded for doubles as well. Please warn me if I'm wrong.
+1  A:
+22  A:

Be extremely careful using any of the suggestions above. It all depends on context.

I have spent a long time tracing a bugs in a system that presumed a=b if |a-b|<epsion. The underlying problems were:

1. The implicit presumption in an algorithm that if a=b and b=c then a=c.

2. Using the same epsilon for lines measured in inches and lines measured in mils (.001 inch). That is a=b but 1000a!=1000b. (This is why AlmostEqual2sComplement asks for the epsilon or max ULPS).

3. The use of the same epsilon for both the cosine of angles and the length of lines!

4. Using such a compare function to sort items in a collection. (In this case using the builtin C++ operator == for doubles produced correct results.)

Like I said: it all depends on context and the expected size of a and b.

BTW, std::numeric_limits::epsilon() is the "machine epsilon". It is the smallest positive value e such that 1+e!=1. I guess that is could be used in the compare function but only if the expected values are less than 1.

Also, if you basically have int arithmetic in doubles (here we use doubles to hold int values in certain cases) your arithmetic will be correct. For example 4.0/2.0 will be the same as 1.0+1.0. This is as long as you do not do things that result in fractions (4.0/3.0) or do not go outside of the size of an int.

Interesting points. +1.
+1 for pointing out the obvious (that often gets ignored). For a generic method, you can make the epsilon relative to `fabs(a)+fabs(b)` but with compensating for NaN, 0 sum and overflow, this gets quite complex.
A:

Here it is the way implemented in Boost Test Library: http://www.boost.org/doc/libs/1_36_0/libs/test/doc/html/utf/testing-tools/floating_point_comparison.html

+5  A:

Comparing floating point numbers for depends on the context. Since even changing the order of operations can produce different results, it is important to know how "equal" you want the numbers to be.

Comparing floating point numbers by Bruce Dawson is a good place to start when looking at floating point comparison.

The following definitions are from The art of computer programming by Knuth:

``````bool approximatelyEqual(float a, float b, float epsilon)
{
return fabs(a - b) <= ( (fabs(a) < fabs(b) ? fabs(b) : fabs(a)) * epsilon);
}

bool essentiallyEqual(float a, float b, float epsilon)
{
return fabs(a - b) <= ( (fabs(a) > fabs(b) ? fabs(b) : fabs(a)) * epsilon);
}

bool definitelyGreaterThan(float a, float b, float epsilon)
{
return (a - b) > ( (fabs(a) < fabs(b) ? fabs(b) : fabs(a)) * epsilon);
}

bool definitelyLessThan(float a, float b, float epsilon)
{
return (b - a) > ( (fabs(a) < fabs(b) ? fabs(b) : fabs(a)) * epsilon);
}
``````

Of course, choosing epsilon depends on the context, and determines how equal you want the numbers to be.

Another method of comparing floating point numbers is to look at the ULP (units in last place) of the numbers. While not dealing specifically with comparisons, the paper What every computer scientist should know about floating point numbers is a good resource for understanding how floating point works and what the pitfalls are, including what ULP is.

+1  A:

I found that the Google C++ Testing Framework contains a nice cross-platform template-based implementation of AlmostEqual2sComplement which works on both doubles and floats. Given that it is released under the BSD license, using it in your own code should be no problem, as long as you retain the license. I extracted the below code from http://code.google.com/p/googletest/source/browse/trunk/include/gtest/internal/gtest-internal.h and added the license on top.

Be sure to #define GTEST_OS_WINDOWS to some value (or to change the code where it's used to something that fits your codebase - it's BSD licensed after all).

Usage example:

``````double left  = // something
double right = // something
const FloatingPoint<double> lhs(left), rhs(right);

if (lhs.AlmostEquals(rhs)) {
//they're equal!
}
``````

Here's the code:

``````// Copyright 2005, Google Inc.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
//     * Redistributions of source code must retain the above copyright
// notice, this list of conditions and the following disclaimer.
//     * Redistributions in binary form must reproduce the above
// copyright notice, this list of conditions and the following disclaimer
// in the documentation and/or other materials provided with the
// distribution.
//     * Neither the name of Google Inc. nor the names of its
// contributors may be used to endorse or promote products derived from
// this software without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
// "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
// LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
// A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
// OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
// SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
// LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
// DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
// THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
// (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
// OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Authors: [email protected] (Zhanyong Wan), [email protected] (Sean Mcafee)
//

// This template class serves as a compile-time function from size to
// type.  It maps a size in bytes to a primitive type with that
// size. e.g.
//
//   TypeWithSize<4>::UInt
//
// is typedef-ed to be unsigned int (unsigned integer made up of 4
// bytes).
//
// Such functionality should belong to STL, but I cannot find it
// there.
//
// Google Test uses this class in the implementation of floating-point
// comparison.
//
// For now it only handles UInt (unsigned int) as that's all Google Test
// needs.  Other types can be easily added in the future if need
// arises.
template <size_t size>
class TypeWithSize {
public:
// This prevents the user from using TypeWithSize<N> with incorrect
// values of N.
typedef void UInt;
};

// The specialization for size 4.
template <>
class TypeWithSize<4> {
public:
// unsigned int has size 4 in both gcc and MSVC.
//
// As base/basictypes.h doesn't compile on Windows, we cannot use
// uint32, uint64, and etc here.
typedef int Int;
typedef unsigned int UInt;
};

// The specialization for size 8.
template <>
class TypeWithSize<8> {
public:
#if GTEST_OS_WINDOWS
typedef __int64 Int;
typedef unsigned __int64 UInt;
#else
typedef long long Int;  // NOLINT
typedef unsigned long long UInt;  // NOLINT
#endif  // GTEST_OS_WINDOWS
};

// This template class represents an IEEE floating-point number
// (either single-precision or double-precision, depending on the
// template parameters).
//
// The purpose of this class is to do more sophisticated number
// comparison.  (Due to round-off error, etc, it's very unlikely that
// two floating-points will be equal exactly.  Hence a naive
// comparison by the == operation often doesn't work.)
//
// Format of IEEE floating-point:
//
//   The most-significant bit being the leftmost, an IEEE
//   floating-point looks like
//
//     sign_bit exponent_bits fraction_bits
//
//   Here, sign_bit is a single bit that designates the sign of the
//   number.
//
//   For float, there are 8 exponent bits and 23 fraction bits.
//
//   For double, there are 11 exponent bits and 52 fraction bits.
//
//   More details can be found at
//   http://en.wikipedia.org/wiki/IEEE_floating-point_standard.
//
// Template parameter:
//
//   RawType: the raw floating-point type (either float or double)
template <typename RawType>
class FloatingPoint {
public:
// Defines the unsigned integer type that has the same size as the
// floating point number.
typedef typename TypeWithSize<sizeof(RawType)>::UInt Bits;

// Constants.

// # of bits in a number.
static const size_t kBitCount = 8*sizeof(RawType);

// # of fraction bits in a number.
static const size_t kFractionBitCount =
std::numeric_limits<RawType>::digits - 1;

// # of exponent bits in a number.
static const size_t kExponentBitCount = kBitCount - 1 - kFractionBitCount;

// The mask for the sign bit.
static const Bits kSignBitMask = static_cast<Bits>(1) << (kBitCount - 1);

// The mask for the fraction bits.
~static_cast<Bits>(0) >> (kExponentBitCount + 1);

// The mask for the exponent bits.

// How many ULP's (Units in the Last Place) we want to tolerate when
// comparing two numbers.  The larger the value, the more error we
// allow.  A 0 value means that two numbers must be exactly the same
// to be considered equal.
//
// The maximum error of a single floating-point operation is 0.5
// units in the last place.  On Intel CPU's, all floating-point
// calculations are done with 80-bit precision, while double has 64
// bits.  Therefore, 4 should be enough for ordinary use.
//
// See the following article for more details on ULP:
// http://www.cygnus-software.com/papers/comparingfloats/comparingfloats.htm.
static const size_t kMaxUlps = 4;

// Constructs a FloatingPoint from a raw floating-point number.
//
// On an Intel CPU, passing a non-normalized NAN (Not a Number)
// around may change its bits, although the new value is guaranteed
// to be also a NAN.  Therefore, don't expect this constructor to
// preserve the bits in x when x is a NAN.
explicit FloatingPoint(const RawType& x) { u_.value_ = x; }

// Static methods

// Reinterprets a bit pattern as a floating-point number.
//
// This function is needed to test the AlmostEquals() method.
static RawType ReinterpretBits(const Bits bits) {
FloatingPoint fp(0);
fp.u_.bits_ = bits;
return fp.u_.value_;
}

// Returns the floating-point number that represent positive infinity.
static RawType Infinity() {
}

// Non-static methods

// Returns the bits that represents this number.
const Bits &bits() const { return u_.bits_; }

// Returns the exponent bits of this number.
Bits exponent_bits() const { return kExponentBitMask & u_.bits_; }

// Returns the fraction bits of this number.
Bits fraction_bits() const { return kFractionBitMask & u_.bits_; }

// Returns the sign bit of this number.
Bits sign_bit() const { return kSignBitMask & u_.bits_; }

// Returns true iff this is NAN (not a number).
bool is_nan() const {
// It's a NAN if the exponent bits are all ones and the fraction
// bits are not entirely zeros.
return (exponent_bits() == kExponentBitMask) && (fraction_bits() != 0);
}

// Returns true iff this number is at most kMaxUlps ULP's away from
// rhs.  In particular, this function:
//
//   - returns false if either number is (or both are) NAN.
//   - treats really large numbers as almost equal to infinity.
//   - thinks +0.0 and -0.0 are 0 DLP's apart.
bool AlmostEquals(const FloatingPoint& rhs) const {
// The IEEE standard says that any comparison operation involving
// a NAN must return false.
if (is_nan() || rhs.is_nan()) return false;

return DistanceBetweenSignAndMagnitudeNumbers(u_.bits_, rhs.u_.bits_)
<= kMaxUlps;
}

private:
// The data type used to store the actual floating-point number.
union FloatingPointUnion {
RawType value_;  // The raw floating-point number.
Bits bits_;      // The bits that represent the number.
};

// Converts an integer from the sign-and-magnitude representation to
// the biased representation.  More precisely, let N be 2 to the
// power of (kBitCount - 1), an integer x is represented by the
// unsigned number x + N.
//
// For instance,
//
//   -N + 1 (the most negative number representable using
//          sign-and-magnitude) is represented by 1;
//   0      is represented by N; and
//   N - 1  (the biggest number representable using
//          sign-and-magnitude) is represented by 2N - 1.
//
// for more details on signed number representations.
static Bits SignAndMagnitudeToBiased(const Bits &sam) {
// sam represents a negative number.
return ~sam + 1;
} else {
// sam represents a positive number.
}
}

// Given two numbers in the sign-and-magnitude representation,
// returns the distance between them as an unsigned number.
static Bits DistanceBetweenSignAndMagnitudeNumbers(const Bits &sam1,
const Bits &sam2) {
const Bits biased1 = SignAndMagnitudeToBiased(sam1);
const Bits biased2 = SignAndMagnitudeToBiased(sam2);
return (biased1 >= biased2) ? (biased1 - biased2) : (biased2 - biased1);
}

FloatingPointUnion u_;
};
``````
+100: This is the best answer here!
A:

I'd be very wary of any of these answers that involves floating point subtraction (e.g., fabs(a-b) < epsilon). First, the floating point numbers become more sparse at greater magnitudes and at high enough magnitudes where the spacing is greater than epsilon, you might as well just be doing a == b. Second, subtracting two very close floating point numbers (as these will tend to be, given that you're looking for near equality) is exactly how you get catastrophic cancellation.

While not portable, I think grom's answer does the best job of avoiding these issues.