views:

81

answers:

3

I have a project where I have to perform some mathematics calculations with double variables. The problem is that I get different results on SUN Solaris 9 and Linux. There are a lot of ways (explained here and other forums) how to make Linux work as Sun, but not the other way around. I cannot touch the Linux code, so it is only SUN I can change. Is there any way to make SUN to behave as Linux?

The code I run(compile with gcc on both systems):

int hash_func(char *long_id)          
{                                 
    double      product, lnum, gold;
    while (*long_id)
        lnum = lnum * 10.0 + (*long_id++ - '0');
    printf("lnum  => %20.20f\n", lnum);
    lnum = lnum * 10.0E-8;
    printf("lnum  => %20.20f\n", lnum);
    gold = 0.6125423371582974;
    product = lnum * gold;
    printf("product => %20.20f\n", product);
    ...
}

if the input is 339886769243483

the output in Linux:

lnum  => 339886769243**483**.00000000000000000000

lnum  => 33988676.9243**4829473495483398**

product => 20819503.600158**59827399253845**

When on SUN:

lnum => 339886769243483.00000000000000000000

lnum => 33988676.92434830218553543091

product = 20819503.600158**60199928283691**

Note: The result is not always different, moreover most of the times it is the same. Just 10 15-digit numbers out of 60000 have this problem.

Please help!!!

+3  A: 

Two things:

Paul R
+2  A: 

These values differ by less than one part in 252. But a double-precision number has just 52 bits behind the radix point. So they differ by just the last bit. It may be that one machine isn't always rounding correctly, but you're doing two multiplications, so the answer is going to have that much error anyway.

Derek Ledbetter
+6  A: 

The real answer here is another question: why do you think you need this? There may be a better way to accomplish what you're trying to do that doesn't depend on intricate details of platform floating-point. Having said that...

It's unfortunate that you can't change the Linux code, since it's really the Linux results that are deficient here. The SUN results are as good as they could possibly be: they're correctly rounded; each multiplication gives the unique (in this case) C double that's closest to the result. In contrast, the first Linux multiplication does not give a correctly rounded result.

Your Linux results come from a 32-bit system on x86 hardware, right? The results you show are consistent with, and likely caused by, the phenomenon of 'double rounding': the result of the first multiplication is first rounded to 64-bit precision (the precision used internally by the Intel x87 FPU), and then re-rounded to the usual 53-bit precision of a double. Most of the time (around 1999 times out of 2000 or so on average) this double round has the same effect as a single round to 53-bit precision would have had, but occasionally it can produce a different result, and that's what you're seeing here.

As you say, there are ways to fix the Linux results to match the Solaris ones: one of these is to use appropriate compiler flags to force the use of SSE2 instructions for floating-point operations if possible. The recent 4.5 release of gcc also fixes the difference by means of a new -fexcess-precision flag, though the fix may impact performance when not using SSE2.

[Edit: after several rereads of the gcc manuals, the gcc-patches mailing list thread at http://gcc.gnu.org/ml/gcc-patches/2008-11/msg00105.html, and the related gcc bug report, it's still not clear to me whether use of -fexcess-precision=standard does in fact eliminate double rounding on x87 systems; I think the answer depends on the value of FLT_EVAL_METHOD. I don't have a 32-bit Linux/x86 machine handy to test this on.]

But I don't know how you'd fix the Solaris results to match the Linux ones, and I'm not sure why you'd want to: you'd be making the Solaris results less accurate instead of making the Linux results more accurate.

[Edit: caf has a good suggestion here. On Solaris, try deliberately using long double for intermediate results, then forcing back to double. If done right, this should reproduce the double rounding effect that you're seeing in Linux.]

See David Monniaux's excellent paper The pitfalls of verifying floating-point computations for a good explanation of double rounding. It's essential reading after the Goldberg article mentioned in an earlier answer.

Mark Dickinson
The OP could try doing the calculation in a `long double` on SPARC, and then rounding the result to a `double` - emulating the effect of the extended precision x87 calculations. No guarantees this will work though.
caf
@caf: Good point---definitely worth a try! It would give different results again on some other platforms, though---e.g., OS X/PPC, where 'long double' is the peculiar 'double double' format, or Windows, where 'long double' seems to be the same as 'double'. Maybe this doesn't matter for the OP's needs.I guess the real answer here is another question: "Why do you need this?".
Mark Dickinson
@Mark Dickinson: Thank you for this detailed explanation.I do understand that I want to make SUN work "incorrectly".Regarding why: I have a system that uses the above code(on Linux) for many years. Now I write another application on SUN, that shall use the same code and giving that same input to produce the same results. The actual 'correctness' of the computation is not important as soon as I have the same values on both SUN and Linux.
Marina
@caf: Thanks, I tried, it helped in part of the cases, but not all of them. in the first computation, though, `lnum = lnum * 10.0E-8` it helped more.I'll try more combinations later.Maybe there are some compiler flags that can help?
Marina
@Marina: It's probable that the x87 version *isn't* rounding the intermediate result to `double` (you would have to check the assembly to know for sure), so try doing all the calculations in `long double` and not rounding until the final step. To get a perfect reproduction though, you might have to somehow round the intermediate results to an 80 bit mantissa (SPARC's `long double` has a 112 bit mantissa).
caf
@caf: Do we know that the SUN hardware Marina is using is SPARC, rather than x86? Actually, is it even SUN hardware? I read SUN Solaris 9 as an architecture/OS pair, but maybe it wasn't intended that way...
Mark Dickinson
I imagine gcc targetting x86 would behave the same way on Solaris x86 and Linux x86, BICBW. That's why I assumed SPARC, anyway (apart from the fact that most of the Solaris machines I deal with are SPARC).
caf
@caf: Ah yes, good point; I imagine you're right.
Mark Dickinson