views:

1362

answers:

10

In follow up to this question, it appears that some numbers cannot be represented by floating point at all, and instead are approximated.

How are floating point numbers stored?

Is there a common standard for the different sizes?

What kind of gotchas do I need to watch out for if I use floating point?

Are they cross-language compatible (ie, what conversions do I need to deal with to send a floating point number from a python program to a C program over TCP/IP)?

A: 

What I remember is a 32 bit floating point is stored using 24 bits for a actual number, and the remain 8 bits are used as a power of 10, determining where the decimal point is.

I'm a bit rusty on the subject tho...

Rik
+5  A: 

The standard is IEEE 754.

Of course, there are other means to store numbers when IEE754 isn't good enough. Libraries like Java's BigDecimal are available for most platforms and map well to SQL's number type. Symbols can be used for irrational numbers, and ratios that can't be accurately represented in binary or decimal floating point can be stored as a ratio.

erickson
+1  A: 

This article entitled "IEEE Standard 754 Floating Point Numbers" may be helpful. To be honest I'm not completely sure I'm understanding your question so I'm not sure that this is going to be helpful but I hope it will be.

Onorio Catenacci
+2  A: 

Yes there is the IEEE Standard for Binary Floating-Point Arithmetic (IEEE 754)

The number is split into three parts, sign, exponent and fraction, when stored in binary.

stukelly
+3  A: 

Basically what you need to worry about in floating point numbers is that there is a limited number of digits of precision. This can cause problems when testing for equality, or if your program actually needs more digits of precision than what that data type give you.

In C++, a good rule of thumb is to think that a float gives you 7 digits of precision, while a double gives you 15. Also, if you are interested in knowing how to test for equality, you can look at this question thread.

Craig H
+4  A: 

As to the second part of your question, unless performance and efficiency are important for your project, then I suggest you transfer the floating point data as a string over TCP/IP. This lets you avoid issues such as byte alignment and will ease debugging.

Knox
+5  A: 

A thorough explanation of the issues surrounding floating point numbers is given in the article What Every Computer Scientist Should Know About Floating-Point Arithmetic.

ChrisN
+1  A: 

If you're really worried about floating point rounding errors, most languages offer data types that don't have floating point errors. SQL Server has the Decimal and Money data types. .Net has the Decimal data type. They aren't infinite precision like BigDecimal in Java, but they are precise down to the number of decimal points they are defined for. So you don't have to worry about a dollar value you type in as $4.58 getting saved as a floating point value of 4.579999999999997

Kibbee
+5  A: 

As mentioned, the Wikipedia article on IEEE 754 does a good job of showing how floating point numbers are stored on most systems.

Now, here are some common gotchas:

  • The biggest is that you almost never want to compare two floating point numbers for equality (or inequality). You'll want to use greater than/less than comparisons instead.
  • The more operations you do on a floating point number, the more significant rounding errors can become.
  • Precision is limited by the size of the fraction, so you may not be able to correctly add numbers that are separated by several orders of magnitude. (For example, you won't be able to add 1E-30 to 1E30.)
Rob Pilkington
+2  A: 

In follow up to this question, it appears that some numbers cannot be represented by floating point at all, and instead are approximated.

Correct.

How are floating point numbers stored? Is there a common standard for the different sizes?

As the other posters already mentioned, almost exclusively IEEE754 and its successor IEEE754R. Googling it gives you thousand explanations together with bit patterns and their explanation. If you still have problems to get it, there are two still common FP formats: IBM and DEC-VAX. For some esoteric machines and compilers (BlitzBasic, TurboPascal) there are some odd formats.

What kind of gotchas do I need to watch out for if I use floating point? Are they cross-language compatible (ie, what conversions do I need to deal with to send a floating point number from a python program to a C program over TCP/IP)?

Practically none, they are cross-language compatible.

Very rare occuring quirks:

  • IEEE754 defines sNaNs (signalling NaNs) and qNaNs (quiet NaNs). The former ones cause a trap which forces the processor to call a handler routine if loaded. The latter ones don't do this. Because language designers hated the possibility that sNaNs interrupt their workflow and supporting them enforce support for handler routines, sNaNs are almost always silently converted into qNaNs. So don't rely on a 1:1 raw conversion. But again: This is very rare and occurs only if NaNs are present.

  • You can have problems with endianness (the bytes are in the wrong order) if files between different computers are shared. It is easily detectable because you are getting NaNs for numbers.

Thorsten S.