views:

2629

answers:

9

Of course most languages have library functions for this, but suppose I want to do it myself.

Suppose that the float is given like in a C or Java program (except for the 'f' or 'd' suffix), for example "4.2e1", ".42e2" or simply "42". In general, we have the "integer part" before the decimal point, the "fractional part" after the decimal point, and the "exponent". All three are integers.

It is easy to find and process the individual digits, but how do you compose them into a value of type float or double without losing precision?

I'm thinking of multiplying the integer part with 10^n, where n is the number of digits in the fractional part, and then adding the fractional part to the integer part and subtracting n from the exponent. This effectively turns 4.2e1 into 42e0, for example. Then I could use the pow function to compute 10^exponent and multiply the result with the new integer part. The question is, does this method guarantee maximum precision throughout?

Any thoughts on this?

A: 

Using a state machine. It's fairly easy to do, and even works if the data stream is interrupted (you just have to keep the state and the partial result). You can also use a parser generator (if you're doing something more complex).

terminus
The parsing is not the problem, it's the construction of the resulting float that gives me trouble.
Thomas
A: 

For that you have to understand the standard IEEE 754 in order for proper binary representation. After that you can use Float.intBitsToFloat or Double.longBitsToDouble.

http://en.wikipedia.org/wiki/IEEE_754

smink
A: 

If you want the most precise result possible, you should use a higher internal working precision, and then downconvert the result to the desired precision. If you don't mind a few ULPs of error, then you can just repeatedly multiply by 10 as necessary with the desired precision. I would avoid the pow() function, since it will produce inexact results for large exponents.

Adam Rosenfield
+2  A: 

I would directly assemble the floating point number using it's binary representation.

Read in the number one character after another and first find all digits. Do that in integer arithmetic. Also keep track of the decimal point and the exponent. This one will be important later.

Now you can assemble your floating point number. The first thing to do is to scan the integer representation of the digits for the first set one-bit (highest to lowest).

The bits immediately following the first one-bit are your mantissa.

Getting the exponent isn't hard either. You know the first one-bit position, the position of the decimal point and the optional exponent from the scientific notation. Combine them and add the floating point exponent bias (I think it's 127, but check some reference please).

This exponent should be somewhere in the range of 0 to 255. If it's larger or smaller you have a positive or negative infinite number (special case).

Store the exponent as it into the bits 24 to 30 of your float.

The most significant bit is simply the sign. One means negative, zero means positive.

It's harder to describe than it really is, try to decompose a floating point number and take a look at the exponent and mantissa and you'll see how easy it really is.

Btw - doing the arithmetic in floating point itself is a bad idea because you will always force your mantissa to be truncated to 23 significant bits. You won't get a exact representation that way.

Nils Pipenbrinck
@Nils: You're ignoring rounding modes, et al. Take a look at strtod to get a feel for what is necessary.
sixlettervariables
Yes, I know. There is even more I've left out like handling denormals and zeros. But it it seemed to me that the original poster wanted to do it for learning purposes, not for production.
Nils Pipenbrinck
Partly true. I want to read a float from a string, but there is other stuff following it inside the string. Java can't handle that. But since the problem turns out to be so fiendishly difficult, I'll just parse the float, put it in a string and throw it at Float.parseFloat() ;)
Thomas
@Thomas: if you're using Java, you'll find the bitmagic is a--excusing my pun--bit hard to do as easily as C. Start off using floats to make your float and then bulk out areas with the bitwise math as Nils suggested (such as the mantissa). You'll learn a lot about math.
sixlettervariables
This description forgets the IEEE-754 exponent is a binary exponent, therefore the mantissa has to be multiplied out: `1e2` => `1010b` => `1.01e11b`. Of course, you can't do this naively, that would take a 1024-bit number, you need to do it by long multiplication. Decent float parsing implementations do this with a base-5 bignum.
Simon Buchan
+1  A: 

You could ignore the decimal when parsing (except for its location). Say the input was: 156.7834e10... This could easily be parsed into the integer 1567834 followed by e10, which you'd then modify to e6, since the decimal was 4 digits from the end of the "numeral" portion of the float.

Precision is an issue. You'll need to check the IEEE spec of the language you're using. If the number of bits in the Mantissa (or Fraction) is larger than the number of bits in your Integer type, then you'll possibly lose precision when someone types in a number such as:

5123.123123e0 - converts to 5123123123 in our method, which does NOT fit in an Integer, but the bits for 5.123123123 may fit in the mantissa of the float spec.

Of course, you could use a method that takes each digit in front of the decimal, multiplies the current total (in a float) by 10, then adds the new digit. For digits after the decimal, multiply the digit by a growing power of 10 before adding to the current total. This method seems to beg the question of why you're doing this at all, however, as it requires the use of the floating point primitive without using the readily available parsing libraries.

Anyway, good luck!

Bill James
A: 

It is not possible to convert any arbitrary string representing a number into a double or float without losing precision. There are many fractional numbers that can be represented exactly in decimal (e.g. "0.1") that can only be approximated in a binary float or double. This is similar to how the fraction 1/3 cannot be represented exactly in decimal, you can only write 0.333333...

If you don't want to use a library function directly why not look at the source code for those library functions? You mentioned Java; most JDKs ship with source code for the class libraries so you could look up how the java.lang.Double.parseDouble(String) method works. Of course something like BigDecimal is better for controlling precision and rounding modes but you said it needs to be a float or double.

sk
+5  A: 

(EDIT: added a bit on David Goldberg's article)

All of the other answers have missed how hard it is to do this properly. You can do a first cut approach at this which is accurate to a certain extent, but until you take into account IEEE rounding modes (et al), you will never have the right answer. I've written naive implementations before with a rather large amount of error.

If you're not scared of math, I highly recommend reading the following article by David Goldberg, What Every Computer Scientist Should Know About Floating-Point Arithmetic. You'll get a better understanding for what is going on under the hood, and why the bits are laid out as such.

My best advice is to start with a working atoi implementation, and move out from there. You'll rapidly find you're missing things, but a few looks at strtod's source and you'll be on the right path (which is a long, long path). Eventually you'll praise insert diety here that there are standard libraries.

/* use this to start your atof implementation */

/* atoi - [email protected] */
/* PUBLIC DOMAIN */
long atoi(const char *value) {
  long ival = 0, c, n = 1, i = 0, oval;
  for( ; c = value[i]; ++i) /* chomp leading spaces */
    if(!isspace(c)) break;
  if(c=='-' || c=='+') { /* chomp sign */
    n = (c!='-' ? n : -1); i++;
  }
  while(c = value[i++]) { /* parse number */
    if(!isdigit(c)) return 0;
    oval = ival; /* save ival for overflow detection */
    ival = (ival * 10) + (c - '0'); /* mult/accum */
    if(ival < oval) { /* report overflow/underflow */
      errno = ERANGE;
      return (n>0 ? LONG_MAX : LONG_MIN);
    }
  }
  return (n>0 ? ival : -ival);
}
sixlettervariables
A: 

The "standard" algorithm for converting a decimal number to the best floating-point approximation is William Clinger's How to read floating point numbers accurately, downloadable from here. Note that doing this correctly requires multiple-precision integers, at least a certain percentage of the time, in order to handle corner cases.

Algorithms for going the other way, printing the best decimal number from a floating-number, are found in Burger and Dybvig's Printing Floating-Point Numbers Quickly and Accurately, downloadable here. This also requires multiple-precision integer arithmetic

See also David M Gay's Correctly Rounded Binary-Decimal and Decimal-Binary Conversions for algorithms going both ways.

Peter S. Housel
A: 

I agree with terminus. A state machine is the best way to accomplish this task as there are many stupid ways a parser can be broken. I am working on one now, I think it is complete and it has I think 13 states.

The problem is not trivial.

I am a hardware engineer interested designing floating point hardware. I am on my second implementation.

I found this today http://speleotrove.com/decimal/decarith.pdf

which on page 18 gives some interesting test cases.

Yes, I have read Clinger's article, but being a simple minded hardware engineer, I can't get my mind around the code presented. The reference to Steele's algorithm as asnwered in Knuth's text was helpful to me. Both input and output are problematic.

All of the aforementioned references to various articles are excellent.

I have yet to sign up here just yet, but when I do, assuming the login is not taken, it will be broh. (broh-dot).

Clyde