views:

5327

answers:

7

Hi All,

Just wondering what the difference between a signle precision floating point operation and double precision floating operation is.

I'm especially interested in practical terms in relation to video game consoles, for example does the nintendo 64 have a 64 bit processor and if it does then would that mean it was capable of double precision floating point operations? Can the ps3 and xbxo 360 pull off double precision floating point operations or only single precision and in generel use is the double precision capabilities made use of (if they exist?).

Thanks for any help!

+1  A: 

Basically single precision floating point arithmetic deals with 32 bit floating point numbers whereas double precision deals with 64 bit.

The number of bits in double precision increases the maximum value that can be stored as well as increasing the precision (ie the number of significant digits).

cletus
A: 

It's all about precision ;-)

Hava a look at wikipedia.

For a better understanding Single precision and Double precision

Peter
+3  A: 

Okay, the basic difference at the machine is that double precision uses twice as many bits as single. In the usual implementation,that's 32 bits for single, 64 bits for double.

But what does that mean? If we assume the IEEE standard, then a single precision number has about 23 bits of the mantissa, and a maximum exponent of about 38; a double precision has 52 bits for the mantissa, and a maximum exponent of about 308.

The details are at Wikipedia, as usual.

Charlie Martin
+10  A: 

Note: the Nintendo 64 does have a 64-bit processor, however:

Many games took advantage of the chip's 32-bit processing mode as the greater data precision available with 64-bit data types is not typically required by 3D games, as well as the fact that processing 64-bit data uses twice as much RAM, cache, and bandwidth, thereby reducing the overall system performance.

From Webopedia:

The term double precision is something of a misnomer because the precision is not really double.
The word double derives from the fact that a double-precision number uses twice as many bits as a regular floating-point number.
For example, if a single-precision number requires 32 bits, its double-precision counterpart will be 64 bits long.

The extra bits increase not only the precision but also the range of magnitudes that can be represented.
The exact amount by which the precision and range of magnitudes are increased depends on what format the program is using to represent floating-point values.
Most computers use a standard format known as the IEEE floating-point format.

From the IEEE standard for floating point arithmetic

Single Precision

The IEEE single precision floating point standard representation requires a 32 bit word, which may be represented as numbered from 0 to 31, left to right.

  • The first bit is the sign bit, S,
  • the next eight bits are the exponent bits, 'E', and
  • the final 23 bits are the fraction 'F':

    S EEEEEEEE FFFFFFFFFFFFFFFFFFFFFFF 0 1 8 9 31

The value V represented by the word may be determined as follows:

  • If E=255 and F is nonzero, then V=NaN ("Not a number")
  • If E=255 and F is zero and S is 1, then V=-Infinity
  • If E=255 and F is zero and S is 0, then V=Infinity
  • If 0<E<255 then V=(-1)**S * 2 ** (E-127) * (1.F) where "1.F" is intended to represent the binary number created by prefixing F with an implicit leading 1 and a binary point.
  • If E=0 and F is nonzero, then V=(-1)**S * 2 ** (-126) * (0.F) These are "unnormalized" values.
  • If E=0 and F is zero and S is 1, then V=-0
  • If E=0 and F is zero and S is 0, then V=0

In particular,

  0 00000000 00000000000000000000000 = 0
  1 00000000 00000000000000000000000 = -0

  0 11111111 00000000000000000000000 = Infinity
  1 11111111 00000000000000000000000 = -Infinity

  0 11111111 00000100000000000000000 = NaN
  1 11111111 00100010001001010101010 = NaN

  0 10000000 00000000000000000000000 = +1 * 2**(128-127) * 1.0 = 2
  0 10000001 10100000000000000000000 = +1 * 2**(129-127) * 1.101 = 6.5
  1 10000001 10100000000000000000000 = -1 * 2**(129-127) * 1.101 = -6.5

  0 00000001 00000000000000000000000 = +1 * 2**(1-127) * 1.0 = 2**(-126)
  0 00000000 10000000000000000000000 = +1 * 2**(-126) * 0.1 = 2**(-127) 
  0 00000000 00000000000000000000001 = +1 * 2**(-126) * 
                                       0.00000000000000000000001 = 
                                       2**(-149)  (Smallest positive value)

Double Precision

The IEEE double precision floating point standard representation requires a 64 bit word, which may be represented as numbered from 0 to 63, left to right.

  • The first bit is the sign bit, S,
  • the next eleven bits are the exponent bits, 'E', and
  • the final 52 bits are the fraction 'F':

    S EEEEEEEEEEE FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF 0 1 11 12 63

The value V represented by the word may be determined as follows:

  • If E=2047 and F is nonzero, then V=NaN ("Not a number")
  • If E=2047 and F is zero and S is 1, then V=-Infinity
  • If E=2047 and F is zero and S is 0, then V=Infinity
  • If 0<E<2047 then V=(-1)**S * 2 ** (E-1023) * (1.F) where "1.F" is intended to represent the binary number created by prefixing F with an implicit leading 1 and a binary point.
  • If E=0 and F is nonzero, then V=(-1)**S * 2 ** (-1022) * (0.F) These are "unnormalized" values.
  • If E=0 and F is zero and S is 1, then V=-0
  • If E=0 and F is zero and S is 0, then V=0

Reference:
ANSI/IEEE Standard 754-1985,
Standard for Binary Floating Point Arithmetic.

VonC
A: 

As to the question "Can the ps3 and xbxo 360 pull off double precision floating point operations or only single precision and in generel use is the double precision capabilities made use of (if they exist?)."

I believe that both platforms are incapable of double floating point. The original Cell processor only had 32 bit floats, same with the ATI hardware which the XBox 360 is based on (R600). The Cell got double floating point support later on, but I'm pretty sure the PS3 doesn't use that chippery.

codekaizen
Care to mention what's wrong with this?
codekaizen
+1 for "chippery"
A: 

Double precision means the numbers takes twice the word-length to store. On a 32-bit processor, the words are all 32 bits, so doubles are 64 bits. What this means in terms of performance is that operations on double precision numbers take a little longer to execute. So you get a better range, but there is a small hit on performance. This hit is mitigated a little by hardware floating point units, but its still there.

The N64 used a MIPS R4300i-based NEC VR4300 which is a 64 bit processor, but the processor communicates with the rest of the system over a 32-bit wide bus. So, most developers used 32 bit numbers because they are faster, and most games at the time did not need the additional precision (so they used floats not doubles).

All three systems can do single and double precision floating operations, but they might not because of performance. (although pretty much everything after the n64 used a 32 bit bus so...)

Alex
A: 

Single precision number uses 32 bits, with the MSB being sign bit, whereas double precision number uses 64bits, MSB being sign bit Single precision-SEEEEEEEEFFFFFFFFFFFFFFFFFFFFFFF.(SIGN+EXPONENT+SIGNIFICAND) Double precision-SEEEEEEEEEEEFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF.(SIGN+EXPONENT+SIGNIFICAND)