+1  A: 

Hi

Not an answer as such, but more text (and formatting) than I could fit in a comment. Reading your question, it strikes me that you have probably considered all of this, but not told us, so this may all be irrelevant chatter. If it is, I apologise.

Can you (did you ?) enforce adherence to IEEE754 rules for floating-point arithmetic on either the original or ported versions of the program ? My first guess is that the two platforms (combination of hardware, o/s, libraries) implement different approaches to fp arithmetic.

What assumptions (if any) have you made about the default sizes, on the two platforms, of some of the fundamental types such as ints and floats. The C standard (and I believe the C++ standard) allow platform-dependency for some such (can't off the top of my head remember which, I'm really a Fortran programmer).

Final guess -- I've grown used (in my Fortranny world) to specifying float constants such as your 4.0 with sufficient digits to specify all the (decimal) digits in the preferred representation, ie something like 4.000000000000000000000000. I know that, in Fortran, a 4-byte float constant such as 3.14159625 will, when automatically cast to 8-bytes, not fill the extra bytes with the further digits in the decimal expression of pi. This may be affecting you.

None of this really helps you ensure that the ported version of your code produces the same, to the bit, results as the original version, only identify sources of difference.

Finally, is your requirement that the new version produce the same results as the old version, or that you provide assurance to your customers that the new version produces accurate answers ? Your question leaves open the possibility that the old version of the program was 'wronger' than the new, taking into account all the sources of error in numerical computations.

Regards

Mark

High Performance Mark
The original code may indeed by "wronger" but if so we need to come up with a scientifically valid reason as to how the new code is acceptable. Unfortunately, whilst we can experiment with the old code to see its sensitivities, the results it calculated stand as the benchmark we have to approach "within reason". I'm pretty sure the original does NOT enforce IEEE754 throughout - the severe shift in results by turning off FMADD/SUB alone indicates that.
Andy Dent
Yes, it's depressing.
Andy Dent
And, as one number-cruncher to another I don't need to tell you this, but will anyway for the edification of the other readers of SO -- if your algorithms are that sensitive there's something wrong -- though whether it be something wrong with the world (failing to agree with the mathematics) or the model (discrete approximations to continuous mathematics) or the computation (floating-point arithmetic not being real arithmetic) is the ever open question. Sigghhhh.
High Performance Mark
I agree entirely about the sensitivity. There are some interesting meetings scheduled for next week ;-)
Andy Dent
+1  A: 

The whole question of floating point determinism across multiple platforms seems to be a very thorny issue and the more you dig into it, the worse it seems to get.

I did find this interesting article that discusses the problem in great depth - it might be able to throw up some ideas.

zebrabox
you deserve an upvote just for making me chuckle - great link thanks and it was nice to see an explanation of why we did so much better with SSE2 enabled.
Andy Dent
+1  A: 

I refer you to GCC bug 323:

I'd like to welcome the newest members of the bug 323 community, where all x87 floating point errors in gcc come to die! All floating point errors that use the x87 are welcome, despite the fact that many of them are easily fixable, and many are not! We're all one happy family, making the egregious mistake of wanting accuracy out of the most accurate general purpose FPU on the market!

The short summary is that it's incredibly tedious to get "true" IEEE floating-point singles/doubles on an x87 without significant performance penalty; you suffer from double-rounding of denorms even if you use fldcw due to the reduced exponent range (IIRC, IEEE FP specifically allows implementations to do their own thing WRT denorms). Presumably you could do something like this:

  1. Round to positive infinity, perform the operation (getting ldresult1), round to nearest even, convert to float (getting fresult1).
  2. RTNI, perform the op, RTNE, convert to float.
  3. If they're the same, great: You have the correct RTNE float result. If not, then (I think) fresult2 < fresult1, and furthermore, fresult1=nextafterf(fresult2,+inf), and there are two possibilities:
    • ldresult1 == ((long double)fresult1+fresult2)/2. The "correct" answer is is fresult2.
    • ldresult2 == ((long double)fresult1+fresult2)/2. The "correct" answer is is fresult1.

I'm probably wrong in the details somewhere, but this is presumably the pain you have to go through when you get a denorm.

And then you hit the other issue: I'm pretty sure there's no guarantee about sqrt() returning the same resolt across different implementations (and very sure for trig functions); the only guarantee I've ever seen is that the result is "within 1 ulp" (presumably of the correctly rounded result). It's highly dependent on the algorithm used, and modern CPUs have instructions for these, so you suffer a significant performance penalty if you try to implement it in software. Nevertheless, ISTR a "portable" floating point library somewhere which was supposed to achieve consistency, but I don't remember the name OTTOMH.

tc.
joy, thanks for the update.
Andy Dent