views:

282

answers:

3

I have the following bit of code, however when compiling it with GCC 4.4 with various optimization flags I get some unexpected results when its run.

#include <iostream>

int main()
{
   const unsigned int cnt = 10;
   double lst[cnt] = { 0.0 };
   const double v[4] = { 131.313, 737.373, 979.797, 731.137 };

   for(unsigned int i = 0; i < cnt; ++i) {
      lst[i] = v[i % 4] * i;
   }

   for(unsigned int i = 0; i < cnt; ++i) {
      double d = v[i % 4] * i;
      if(lst[i] != d) {
         std::cout << "error @ : " << i << std::endl;
         return 1;
      }
   }
   return 0;
}
  • when compiled with: "g++ -pedantic -Wall -Werror -O1 -o test test.cpp" I get the following output: "error @ : 3"

  • when compiled with: "g++ -pedantic -Wall -Werror -O2 -o test test.cpp" I get the following output: "error @ : 3"

  • when compiled with: "g++ -pedantic -Wall -Werror -O3 -o test test.cpp" I get no errors

  • when compiled with: "g++ -pedantic -Wall -Werror -o test test.cpp" I get no errors

I do not believe this to be an issue related to rounding, or epsilon difference in the comparison. I've tried this with Intel v10 and MSVC 9.0 and they all seem to work as expected. I believe this should be nothing more than a bitwise compare.

If I replace the if-statement with the following: if (static_cast<long long int>(lst[i]) != static_cast<long long int>(d)), and add "-Wno-long-long" I get no errors in any of the optimization modes when run.

If I add std::cout << d << std::endl; before the "return 1", I get no errors in any of the optimization modes when run.

Is this a bug in my code, or is there something wrong with GCC and the way it handles the double type?

Note: I've just tried this with gcc versions 4.3 and 3.3, the error is not exhibited.

Resolution: Mike Dinsdale noted the following bug report: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=323 It seems the GCC team are not completely sure about nature of problem.

As suggested in the bug report a possible resolution is to use the ffloat-store option. I've tried this and it works, however the results from a performance point of view are not that great, though ymmv.

+6  A: 

The problem is likely the result of losing some precision when storing the result of an expression vs. the compiler not doing so in a local as an optimization:

double d = v[i % 4] * i;  // the result, `d`, might be kept in a register 
                          //   instead of storing it in a memory location, 
                          //   keeping full precision

if(lst[i] != d) {         // the value stored in lst[i] may have lost some 
                          //   precision since it had to be stored in memory,
                          //   which might not be able to hold the full 
                          //   precision that the expression generated

The C99 standard says in 6.3.1.8/2 "Usual arithmetic conversions":

The values of floating operands and of the results of floating expressions may be represented in greater precision and range than that required by the type; the types are not changed thereby.

Michael Burr
This error is not seen when no optimization is enabled as per the 4th option, also on the same system (ubuntu) and on windows using intel v10, the error is not seen.
Monomer
@Monomer: The effect of this behavior is very likely to be strongly influenced by implementation, platform, and optimization settings. This type of thing is one of the reasons why I'm quite happy that I rarely have to deal with floating point.
Michael Burr
@Michael: I agree, floating point issues have been the bane of my existence for the last couple of years, you think you've seen it all then all of a sudden something else just pops up.
Monomer
A: 

The width of the floating point registers in x86 is different from the width of the double in RAM. Therefore comparisons may succeed or fail depending entirely on how the compiler decides to optimize the loads of floating point values.

Zan Lynx
that is correct, however shouldn't the compiler generate the correct code for the given expression?
Monomer
It does generate correct code. How is it not correct? In a decimal example, 3.333 is not equal to 3.333333. That is how the FPU is seeing it, only in binary instead of decimal.
Zan Lynx
the point i'm trying to make is that the values on both sides of the "==" are generated in the same manner. I understand that for doubles a == b and b == a may not produce the same result. But, these values are bitwise identical, if they weren't the long long compare approach would not work. Also how does one explain the whole thing working, if I had an extra print statement? doesn't that smell a bit fishy?
Monomer
@Monomer: Look at the machine code. It compares a register value to a memory value. Those are not the same. The reason casting to long works is that it takes it out of the machine register because C knows a double isn't a long.
Zan Lynx
+7  A: 

The fact that the result depends on the optimization settings suggests it might be the x87 extended precision messing with things (as Michael Burr says).

Here's some code I use (with gcc on x86 processors) to switch the extended precision off:

static const unsigned int PRECISION_BIT_MASK = 0x300;
///< bitmask to mask out all non-precision bits in the fpu control word \cite{INTEL}.
static const unsigned int EXTENDED_PRECISION_BITS = 0x300;
///< add to the fpu control word (after zeroing precision bits) to turn on extended precision \cite{INTEL}.
static const unsigned int STANDARD_PRECISION_BITS = 0x200;
///< add to the fpu control word (after zeroing precision bits) to turn off extended precision \cite{INTEL}.

void set_fpu_control_word(unsigned int mode)
{
  asm ("fldcw %0" : : "m" (*&mode));
}

unsigned int get_fpu_control_word()
{
  volatile unsigned int mode = 0;
  asm ("fstcw %0" : "=m" (*&mode));
  return mode;
}

bool fpu_set_extended_precision_is_on(bool state)
{
  unsigned int old_cw = get_fpu_control_word();
  unsigned int masked = old_cw & ~PRECISION_BIT_MASK;
  unsigned int new_cw;
  if(state)
    new_cw = masked + EXTENDED_PRECISION_BITS;
  else
    new_cw = masked + STANDARD_PRECISION_BITS;
  set_fpu_control_word(new_cw);
  return true;
}

bool fpu_get_extended_precision_is_on()
{
  unsigned int old_cw = get_fpu_control_word();
  return  ((old_cw & PRECISION_BIT_MASK) == 0x300);
}

Or you can just run your code with valgrind, which doesn't simulate the 80-bit registers, and is probably easier for a short program like this!

Mike Dinsdale
@Mike: The control word is set on, I do think that this is an issue of 64-bit floating points being downgraded to 32-bit floating points.
Monomer
To 32-bit? Have you checked the difference and it's really that large? If so, that looks like a bug in gcc. If not I'd wager that it's this 80-bit -> 64-bit truncation caused by saving the FPU registers to memory. Do you have valgrind installed? If so I'd check if the problem disappears running your code through valgrind.
Mike Dinsdale
@Mike: a bug relating to saving the fpu state makes, since if i access the "d" value once more (aka a print statement for d) the problem goes away.
Monomer
For gcc 4.3.3 the test seems to fail at -O1 and -O2 but not at -O0 or -O3, and using valgrind prevents all the failures, so I'm pretty sure it's the extended precision doing it. The gcc folks don't consider it a bug (though you might disagree!): http://gcc.gnu.org/bugzilla/show_bug.cgi?id=323
Mike Dinsdale
@Mike: Thats an excellent find! I tried using the ffloat-store option as they suggested, it works, but some of my unit test are now running at around 90% of their usual. Looks like I'll be using Intel on linux until clang comes fully into town.
Monomer
Glad to help! (It took me two days to work out what was happening the first time I came across this ;)) Have you tried switching the extended precision off e.g. by calling my (riduculously long-named!) set_extended_precision_is_on(false) - I think that's what VS et al. do implicitly, so it might avoid your speed hit.
Mike Dinsdale