I am looking for suggestions on how to find the sizes (in bits) and range of floating point numbers in an architecture independent manner. The code could be built on various platforms (AIX, Linux, HPUX, VMS, maybe Windoze) using different flags - so results should vary. The sign, I've only seen as one bit, but how to measure the size of the exponent and mantissa?
views:
194answers:
5Have a look at the values defined in float.h
. Those should give you the values you need.
The number of bits used to store each field in a floating point number doesn't change.
Sign Exponent Fraction Bias
Single Precision 1 [31] 8 [30-23] 23 [22-00] 127
Double Precision 1 [63] 11 [62-52] 52 [51-00] 1023
EDIT: As Jonathan pointed out in the comments, I left out the long double type. I'll leave its bit decomposition as an exercise for the reader. :)
Since you're looking at building for a number of systems, I think you may be looking at using GCC for compilation.
Some good info on floating point - this is what almost all modern architectures use: http://en.wikipedia.org/wiki/IEEE_754
This details some of the differences that can come up http://www.network-theory.co.uk/docs/gccintro/gccintro_70.html
As you follow the links suggested in previous comments, you'll probably see references to What Every Computer Scientist Should Know About Floating Point Arithmetic. By all means, take the time to read this paper. It pops up everywhere when floating point is discussed.
Its relatively easy to find out:
Decimal or binary;
myfloat a = 2.0, b = 0.0;
for (int i=0; i<20; i++) b += 0.1;
(a == b) => decimal, else binary
Reason: All binary systems can represent 2.0, but any binary system will have an error term for representing 0.1. By accumulating you can be sure that this error term will not vanish like in rounding: e.g. 1.0 == 3.0*(1.0/3.0) even in binary systems
Mantissa length:
Myfloat a = 1.0, b = 1.0, c, inc = 1.0;
int mantissabits = 0;
do { mantissabits++; inc *= 0.5; // effectively shift to the right c = b+inc; } while (a != c);
You are adding decreasing terms until you reach the capacity of the mantissa. It gives back 24 bits for float and 53 bits for double which is correct (The mantissa itself contains only 23/52 bits, but as the first bit is always one on normalized values, you have a hidden extra bit).
Exponent length: Myfloat a = 1.0; int max = 0, min = 0;
while (true) { a *= 2.0; if (a != NaN && a != Infinity && whatever) // depends on system max++; else break; }
a = 1.0; while (true) { a *= 0.5; if (a != 0.0) min--; else break; }
You are shifting 1.0 to the left or to the right until you hit the top or the bottom. Normally the exp range is -(max+1) - max. If min is smaller than -(max+1), you have (as floats and doubles have) subnormals. Normally positive and negative values are symmetric (with perhaps one offset), but you can adjust the test by adding negative values.