views:

263

answers:

6

I'm trying to write a function in the D programming language to replace the calls to C's strtold. (Rationale: To use strtold from D, you have to convert D strings to C strings, which is inefficient. Also, strtold can't be executed at compile time.) I've come up with an implementation that mostly works, but I seem to lose some precision in the least significant bits.

The code to the interesting part of the algorithm is below and I can see where the precision loss comes from, but I don't know how to get rid of it. (I've left out a lot of the parts of code that weren't relevant to the core algorithm to save people reading.) What string-to-float algorithm will guarantee that the result will be as close as possible on the IEEE number line to the value represented by the string.

real currentPlace = 10.0L ^^ (pointPos - ePos + 1 + expon);

real ans = 0;
for(int index = ePos - 1; index > -1; index--) {
    if(str[index] == '.') {
        continue;
    }

    if(str[index] < '0' || str[index] > '9') {
        err();
    }

    auto digit = cast(int) str[index] - cast(int) '0';
    ans += digit * currentPlace;
    currentPlace *= 10;
}

return ans * sign;

Also, I'm using the unit tests for the old version, which did things like:

assert(to!(real)("0.456") == 0.456L);

Is it possible that the answers being produced by my function are actually more accurate than the representation the compiler produces when parsing a floating point literal, but the compiler (which is written in C++) always agrees exactly with strtold because it uses strtold internally for parsing floating point literals?

A: 

You can't store most floats with perfect accuracy in a digital computer

Martin Beckett
Right. This is why I define "perfect" accuracy as the closest (32/64/80)-bit floating-point number to the number represented by the input string.
dsimcha
+3  A: 

Clinger and Steele & White developed fine algorithms for reading and writing floating point.

There's a retrospective here along with some references to implementations.

David Gay's paper improving Clinger's work, and Gay's implementation in C are great. I have used them in embedded systems, and I believe Gay's dtoa made its way into many libc's.

Doug Currie
Gay's source code contains it's own BigInt implementation which it uses in some of its temporary calculations, its quite a bit of code. I think the paper is much easier to read.
John Knoeller
Gay's dtoa is used everywhere: Firefox, Opera, Safari, Thunderbird, KDE, Chrome, Python, MySQL, Mac OS X, J, and D, for example.
Rick Regan
+1  A: 

Start by accumulating the digits as an integer value, ignoring the decimal point and the exponent. You would still use a floating point accumulator, but you would have no fractional part, this avoids loss of precision because of the inability to exactly express floating point numbers. (you chould also ignore fractional digits that are beyond the precision of floats to represent - 8 digits for 32 bit IEEE floats).

You could use a 64bit integer to accumulate digits if you prefer but you have to be careful to ignore extra digits that would cause overflow if you do. (you may still have to take these digits into account when determining the exponent)

Then scale this value by the exponent, taking into account the location of the decimal point that you ignored while accumulating digits.

John Knoeller
A: 

You create a floating point number for each digit and then add these numbers together. Since floating point numbers are not exact, but rounded to a certain number of binary digits, this involves small imprecisions when storing the single numbers and adding them up. Therefore adding the floating point numbers for the single digits together might give a result with a small rounding error.

An example would be 0.1 + 0.02, which is not equal to 0.12 if represented as a floating point number. (To verify this just try to compare them in your favorite programming language)

sth
+2  A: 

Honestly, this is one of those things that you really ought not be doing if you don't already know how to do it. It's full of pitfalls, and even if you manage to get it right, it will likely be tremendously slow if you don't have expertise in analyzing low-level numerics performance.

That said, if you're really determined to write your own implementation, the best reference for correctness is David Gay's "Correctly Rounded Binary-Decimal and Decimal-Binary Conversions" (postscript version). You should also study his reference implementations (in C), which are available on Netlib.

Stephen Canon
A: 

I'm not sure what your saying. I tried the D programming language, and C#, both of them gave me 0.12.

Hmm.. what am I talking about. Some thing about adding 0.1 to 0.12 and not getting 0.12.

Joel