views:

385

answers:

2

Having already read this question I'm reasonably certain that a given process using floating point arithmatic with the same input (on the same hardware, compiled with the same compiler) should be deterministic. I'm looking at a case where this isn't true and trying to determine what could have caused this.

I've compiled an executable and I'm feeding it the exact same data, running on a single machine (non-multithreaded) but I'm getting errors of about 3.814697265625e-06 which after careful googling I found is actually equal to 1/4^9 = 1/2^18 = 1/262144. which is pretty close to the precision level of a 32-bit floating point number (approx 7 digits according to wikipedia)

My suspicion is that it has something to do with optimisations that have been applied to the code. I'm using the intel C++ compiler and have turned floating point speculation to fast instead of safe or strict. Could this make a floating point process non-deterministic? Are there other optimisations etc that could lead to this behaviour?

EDIT: As per Pax's suggestion I recompiled the code with floating point speculation turned to safe and I'm now getting stable results. This allows me to clarify this question - what does floating-point-speculation actually do and how can this cause the same binary (i.e. one compilation, multiple runs) to generate different results when applied to the exact same input?

@Ben I'm compiling using Intel(R) C++ 11.0.061 [IA-32] and I'm running on an Intel quadcore processor.

+5  A: 

In almost any situation where there's a fast mode and a safe mode, you'll find a trade-off of some sort. Otherwise everything would run in fast-safe mode :-).

And, if you're getting different results with the same input, your process is not deterministic, no matter how much you believe it to be (in spite of the empirical evidence).

I would say your explanation is the most likely. Put it in safe mode and see if the non-determinism goes away. That will tell you for sure.

As to whether there are other optimizations, if you're compiling on the same hardware with the same compiler/linker and the same options to those tools, it should generate identical code. I can't see any other possibility other than the fast mode (or bit rot in the memory due to cosmic rays, but that's pretty unlikely).

Following your update:

Intel has a document here which explains some of the things they're not allowed to do in safe mode, including but not limited to:

  • reassociation: (a+b)+c -> a+(b+c).
  • zero folding: x + 0 -> x, x * 0 -> 0.
  • reciprocal multiply: a/b -> a*(1/b).

While you state that these operations are compile-time defined, the Intel chips are pretty darned clever. They can re-order instructions to keep pipelines full in multi-CPU set-ups so, unless the code specifically prohibits such behavior, things may change at run-time (not compile-time) to keep things going at full speed.

This is covered (briefly) on page 15 of that linked document that talks about vectorization ("Issue: different results re-running the same binary on the same data on the same processor").

My advice would be to decide whether you need raw grunt or total reproducability of results and then choose the mode based on that.

paxdiablo
Thanks for the good explanation and resources. That document you've linked does state that this problem (where the global stack address and alignment can change due to events outside the currently running process) has been fixed in the 11.x series of intel compilers (which I'm using). However I think that you have probably hit upon the answer in that there is some sort of instruction re-ordering going on when running with multiple cpus and many open applications. Thanks again.
Jamie Cook