ansaurus

Question

How do you print the EXACT value of a floating point number?

Answer 1

+1 A:

You don't. The closest you can come to that is dumping the bytes.

Steven Sudit 2010-07-09 17:54:00

I've thought about this some more, and I think I'm wrong. Since base 10 goes into base 2, there shouldn't be any binary values that can only be represented in decimal if we allow repeating digits. As a result, you should in principle be able to convert a float/double to a (potentially very long) string of decimal digits.

Steven Sudit 2010-07-09 17:59:31

Of course you can. I have an implementation that does it in `O(e^2)` time (which hopefully can be improved) and `O(e)` space (which the decimal representation necessarily requires) as I described.

R.. 2010-07-09 18:01:02

To finish answering, yes, the algorithm you described looks like it would work, but an arbitrary precision library (like the one Byron recommended) would make things easy. For something related but I think different, there's also: http://keithbriggs.info/xrc.html

Steven Sudit 2010-07-09 18:02:56

I suspect that implementing the multiplications as shifts will speed things up, but that doesn't necessarily improve the big O.

Steven Sudit 2010-07-09 18:04:54

I think what i just wrote is wrong, because I missed out on the fact that the doubling is happening to the decimal value. Perhaps the way to handle this is to keep the output in a format like BCD until you're done.

Steven Sudit 2010-07-09 18:07:03

You can definitely store several decimal digits in each array position; in fact it should work to use 32bit integers to store values up to 1 billion as long as you handle the carry/remainder right. The big-O doesn't improve but the constant is slashed by a factor of 9. At the end, normal integer printing of each 32bit cell can be performed.

R.. 2010-07-09 18:21:04

Skeet's code takes the compromise of 1 digit per byte, as opposed to the 2 that BCD would bring. While he loses compactness, he significantly simplifies the implementation of operations such as shifting and multiplying, so it might be worth it. It's better than the usual BigNumber solution of using a character for each, because you avoid having to mask and unmask the 48.

Steven Sudit 2010-07-09 18:26:05

I was reading http://en.wikipedia.org/wiki/Binary-coded_decimal, which gives the algorithm for adding BCD values. You could perform a doubling simply by adding the value to itself. Essentially, you do a binary add and then adjust it back to BCD. This could be faster than digit-by-digit adding.

Steven Sudit 2010-07-09 18:34:44

This is a link to an explanation of the algorithm used by the old ASM instruction, DAA: http://faydoc.tripod.com/cpu/daa.htm. At first glance, it looks like it could be scaled up, with a table implementation that implements carry by returning values twice as wide as the input.

Steven Sudit 2010-07-09 18:45:52

I think I'm out of ideas, although I've done enough to repent for my original mistake. I do want to point out that, even though this prints the precise decimal equivalent of the float, the float itself may not have been initialized with precise equivalents of the decimal, and even if it was, then operations on it may well have generated repeating digits and other artifacts. If you really want a precise answer, something like xrc is the way to go. Good luck.

Steven Sudit 2010-07-09 19:01:10

This answer was downvoted sometime after it was originally written as part of a pattern of arbitrary downvotes.

Steven Sudit 2010-07-22 19:28:31

Answer 2

+2 A:

Well being no floating point expert myself, I'd defer to using a well tested open source library.

The GNU MPFR is a good one.

The MPFR library is a C library for multiple-precision floating-point computations with correct rounding. The main goal of MPFR is to provide a library for multiple-precision floating-point computation which is both efficient and has a well-defined semantics.

Byron Whitlock 2010-07-09 17:55:21

And it does support conversion from double to arbitrary decimal.

Steven Sudit 2010-07-09 18:15:42

Answer 3

A:

If you want more exact results, why not use fixed point math instead? Conversions are quick. Error is known and can be worked around. Not an exact answer to your question, but a different idea for you.

Michael Dorgan 2010-07-09 17:55:25

Wouldn't be a bad idea if I were using this in a specific application, but the problem domain is specifically solving this (rather painful) floating point to exact decimal conversion.

R.. 2010-07-09 18:03:48

@R.: Then maybe that xrc lib *is* right for you.

Steven Sudit 2010-07-09 18:16:44

Answer 4

A:

Off the top of my head, why not break the exponent down into a sum of binary exponents first, then all your operations are loss-less.

I.e.

10^2 = 2^6 + 2^5 + 2^2

Then sum:

mantissa<<6 + mantissa<<5 + mantissa<<2

I'm thinking that breaking it down would be on the O(n) on the the number of digits, the shifting is O(1), and the summing is O(n) digits...

You would have to have an integer class big enough to store the results, of course...

Let me know - I'm curious about this, it really made me think. :-)

eruciform 2010-07-09 18:03:12

The exponent is a binary exponent to begin with. And there's definitely no integer type (without some sort of bigint) capable of storing the result. It can be over 1000 digits with a double, and over 16000 digits with a long double. :-)

R.. 2010-07-09 18:14:12

@r: i guess you could calloc(1000) and then bit-copy things in the right place. but definitely messy. floating point is there for a reason. :-)

eruciform 2010-07-09 18:31:27

Answer 5

+2 A:

Although it's C# and your question is tagged with C, Jon Skeet has code to convert a double to its exact representation as a string: http://www.yoda.arachsys.com/csharp/DoubleConverter.cs

From a quick glance, it does not appear to be too hard to port to C, and even easier to write in C++.

Upon further reflection, it appears that Jon's algorithm is also O(e^2), as it also loops over the exponent. However, that means the algorithm is O(log(n)^2) (where n is the floating-point number), and I'm not sure you can convert from base 2 to base 10 in better than log-squared time.

Gabe 2010-07-09 18:08:32

Interesting. Looks like he took that BCD approach, or close to it.

Steven Sudit 2010-07-09 18:14:18

That is the same method he mentioned in the question.

Larry Wang 2010-07-09 18:15:48

@Kaestur: Yes, but the code shows how to handle the fringe cases, such as subnormals. It's worth looking at.

Steven Sudit 2010-07-09 18:23:17

If you're considering the theoretical big-O (and bignum stuff), then conversion from base 2 to base 10 probably can't be done in less than log-squared time. But if your numbers fit in machine words, it's log time, which is much better. The question is whether you can do the same thing for floating point numbers using the machine's floating point arithmetic.

R.. 2010-07-09 18:27:08

My implementation used the ugly loop (rather than bit fiddling) to extract the mantissa, so it didn't care if the floating point value was subnormal to begin with. `for (e=0; x<1; x*=2, e--);` brought it into normal range in a few iterations.

R.. 2010-07-09 18:31:32

Answer 6

+16 A:

This question has a bureaucratic part and an algorithmic part. A floating point number is stored internally as 2^e*m, where e is an exponent (itself in binary) and m is a mantissa. The bureaucratic part of the question is how to access this data, but R. seems more interested in the algorithmic part of the question, namely converting 2^e*m to a fraction a/b in decimal form. The answer to the bureaucratic question in several languages is "frexp" (which is an interesting detail that I didn't know before today).

It is true that at first glance, it takes O(e^2) work just to write 2^e in decimal, and more time still for the mantissa. But, thanks to the magic of the Schonhage-Strassen fast multiplication algorithm, you can do it in O-tilde(e) time, where the tilde means "up to log factors". If you view Schonhage-Strassen as magic, then it's not that hard to think of what to do. If e is even, you can recursively compute 2^(e/2), and then square it using fast multiplication. On the other hand if e is odd, you can recursively compute 2^(e-1) and then double it. You have to be careful to check that there is a version of Schonhage-Strassen in base 10. Although it is not widely documented, it can be done in any base.

Converting a very long mantissa from binary to base 10 is not exactly the same question, but it has a similar answer. You can divide the mantissa into two halves, m = a*2^k+b. Then recursively convert a and b to base 10, convert 2^k to base 10, and do another fast multiplication to compute m in base 10.

The abstract result behind all of this is that you can convert integers from one base to another in O-tilde(N) time.

If the question is about standard 64-bit floating point numbers, then it's too small for the fancy Schonhage-Strassen algorithm. In this range you can instead save work with various tricks. One approach is to store all 2048 values of 2^e in a lookup table, and then work in the mantissa with asymmetric multiplication (in between long multiplication and short multiplication). Another trick is to work in base 10000 (or a higher power of 10, depending on architecture) instead of base 10. But, as R. points out in the comments, 128-bit floating point numbers already allow large enough exponents to call into question both lookup tables and standard long multiplication. As a practical matter, long multiplication is the fastest up to a handful of digits, then in a significant medium range one can use Karatsuba multiplication or Toom-Cook multiplication, and then after that a variation of Schonhage-Strassen is best not just in theory but also in practice.

Actually, the big integer package GMP already has O-tilde(N) time radix conversion, as well as good heuristics for which choice of multiplication algorithm. The only difference between their solution and mine is that instead of doing any big arithmetic in base 10, they compute large powers of 10 in base 2. In this solution, they also need fast division, but that can be obtained from fast multiplication in any of several ways.

Greg Kuperberg 2010-07-09 18:37:29

Thanks for the link and the first answer with any theoretical content! It looks like [Toom-Cook](http://en.wikipedia.org/wiki/Toom%E2%80%93Cook_multiplication) might actually be the preferable algorithm for non-astronomical-exponents.

R.. 2010-07-09 19:03:22

Very interesting. Could you explain how using base 10000 speeds things up?

Steven Sudit 2010-07-09 19:04:24

Steven: Using base 10000 speeds things up because it's 4 times faster than base 10 due to them both fitting in a machine word.

Gabe 2010-07-09 19:08:39

Don't get too excited: "As a rule of thumb, Karatsuba is usually faster when the multiplicands are longer than 320-640 bits." -http://en.wikipedia.org/wiki/Karatsuba_algorithm

Steven Sudit 2010-07-09 19:13:07

@Gabe: That makes sense. Base 10000 can be trivially converted to decimal, but speeds up the intermediate steps by lowering the number of digits. I'm not sure how this compares to the BCD-based approaches, but it's certainly viable.

Steven Sudit 2010-07-09 19:14:31

R.: Well, the adjective "astronomical" is a bit flat because there are various sizes of numbers for various purposes. It is true that in a medium range you or a library might use Karatsuba or Toom-Cook, but actually both of them are similar to Schonhage-Strassen.

Greg Kuperberg 2010-07-09 19:21:40

R: Even Toom-Cook and Karatsuba are too big for piddly little 64-bit floats. If you just wanted a faster implementation you would simply use a bigger base than 10 (1e2 for bytes, 1e4 for words, and 1e9 for ints) and do larger multiplies (e.g. do 1/3 as many multiplies by 8 or 1/16 as many multiplies by 65536).

Gabe 2010-07-09 19:23:14

Steven: See for instance http://stackoverflow.com/questions/1469529/#1875697 . For multiplication you wouldn't just want one digit to fit in a machine word, but also the product of two digits.

Greg Kuperberg 2010-07-09 19:24:48

@Gabe, are you sure? A "64-bit" float involves ~1076-digit (decimal) arithmetic. An "80-bit" float involves ~16448-digit arithmetic.

R.. 2010-07-09 19:28:32

R.: Actually the max 64-bit float in IEEE 754 has 309 decimal digits and a max 128-bit float has 4933 decimal digits. Remember, there will be fewer decimal digits than binary digits. But you're right that that is getting to the range of accelerated multiplciation algorithms.

Greg Kuperberg 2010-07-09 19:44:18

You're thinking of cases where the exponent is positive. If it's negative, every time you decrement the exponent further you get an extra decimal place on the right (holding a '5') but it takes several exponent decrements to clear a decimal place on the left (e.g. 5->2->1->0). I overestimated but it seems you need roughly binary_exp*2/3 decimal digits, so ~700 digits for IEEE 754.

R.. 2010-07-09 19:54:01

Oh, I missed a detail in the question. I thought you wanted a decimal representation of the numerator and denominator of the fraction. For the exact decimal of the fraction itself, I agree with your estimate.

Greg Kuperberg 2010-07-09 19:58:02

R: Assuming you use 64-bit machine words (base 1e19), each double-precision float will only require 17 (or 37 for negative exponents) words at most to represent, and you could then multiply by 2^64 at a time, so you would only have to do 16 multiplies. Since that's just worst case, most conversions would require very few operations, so Karatsuba or Toom-3 wouldn't give you an advantage.

Gabe 2010-07-09 20:08:14

I was thinking 32-bit machine words which are still pretty much the norm. I could try making the code use `size_t` or another type that's essentially-guaranteed to be machine-word-sized instead of `uint32_t` with compile-time computations to determine the base that gets used, but it might be a bit of a pain.

R.. 2010-07-10 09:08:48

@Greg: If I understand correctly, the "320-640 bits" does not apply to the "compact" floating point representation here but rather to the full expansion (whether decimal or otherwise).

Steven Sudit 2010-07-12 02:24:55

Steven: I think that that's right. However, it applies, because the question posed was about converting a floating point binary to a full decimal. Actually Karatsuba in base 10 would become competitive sooner, because the advice about the threshold assumes a fast arithmetic unit in the CPU.

Greg Kuperberg 2010-07-12 02:34:07

Answer 7

A:

There's been a lot of work on printing floating-point numbers. The gold standard is to print out a decimal equivalent of minimal length such that when the decimal equivalent is read back in, you get the same floating-point number that you started with, no matter what the rounding mode is during readback. You can read about the algorithm in the excellent paper by Burger and Dybvig.

Norman Ramsey 2010-07-09 19:52:30

That's a well-researched problem that's in some ways simpler and in some ways more difficult, but regardless it's a different problem. Thanks for the link though.

R.. 2010-07-09 20:40:59

@R: Oops. I failed to understand the question. Perhaps an example would have helped.

Norman Ramsey 2010-07-09 22:51:57

Answer 8

+6 A:

I see you've accepted an answer already but here are a couple of open source implementations of this conversion you might want to look at:

1) David Gay's dtoa() function in dtoa.c (http://www.netlib.org/fp/dtoa.c).

2) The function ___printf_fp() in file /stdio-common/printf_fp.c in glibc (http://ftp.gnu.org/gnu/glibc/glibc-2.11.2.tar.gz, for example).

Both will print as many digits as you ask for in a %f type printf (as I've written about here: http://www.exploringbinary.com/print-precision-of-dyadic-fractions-varies-by-language/ and http://www.exploringbinary.com/print-precision-of-floating-point-integers-varies-too/ ).

Rick Regan 2010-07-10 17:41:57

Great answer! This is the type of thing I was looking for. I'll check out those sources.

R.. 2010-07-10 18:26:20

Interesting code, too.

Steven Sudit 2010-07-13 18:49:47

Answer 9

A:

sprintf and similar functions are usually only specified up to a number of significant digits to uniquely determine the original floating point value; they don't necessarily print the exact decimal representation.

You can ask for more significant digits than the default:

printf("%.100g\n", 0.1);

prints 0.1000000000000000055511151231257827021181583404541015625.

dan04 2010-07-13 06:34:21

Your system's printf happens to do the polite (but not specified by any standard) thing and compute as many digits as requested. Most just chop everything off after computing sufficiently many digits to uniquely determine the float. See the links in Rick Regan's answer.

R.. 2010-07-13 06:42:23

ansaurus

tags:

views:

answers:

How do you print the EXACT value of a floating point number?

related questions