views:

137

answers:

3

I am doing some floating point arithmetic and having precision problems. The resulting value is different on two machines for the same input. I read the post @ http://stackoverflow.com/questions/3031143/why-cant-i-multiply-a-float and also read other material on the web & understood that it is got to do with binary representation of floating point and on machine epsilon. However, I wanted to check if there is a way to solve this problem / Some work around for Floating point arithmetic in C++ ?? I am converting a float to unsigned short for storage and am converting back when necessary. However, when I convert it back to unsigned short, the precision (to 6 decimal points) remains correct on one machine but fails on the other.

//convert FLOAT to short

unsigned short sConst = 0xFFFF;

unsigned short shortValue = (unsigned short)(floatValue * sConst);

//Convert SHORT to FLOAT

float floatValue = ((float)shortValue / sConst);
A: 

Are you looking for standard like this:

Programming Languages C++ - Technical Report of Type 2 on Extensions for the programming language C++ to support decimal floating point arithmetic draft

Sheen
There is no evidence in the question that the floats have a terminating decimal representation.
David Thornley
+1  A: 

If you want to use native floating point types, the best you can do is to assert that the values output by your program do not differ too much from a set of reference values.

The precise definition of "too much" depends entirely on your application. For example, if you compute a + b on different platforms, you should find the two results to be within machine precision of each other. On the other hand, if you're doing something more complicated like matrix inversion, the results will most likely differ by more than machine precision. Determining precisely how close you can expect the results to be to each other is a very subtle and complicated process. Unless you know exactly what you are doing, it is probably safer (and saner) to determine the amount of precision you need downstream in your application and verify that the result is sufficiently precise.

To get an idea about how to compute the relative error between two floating point values robustly, see this answer and the floating point guide linked therein:

http://stackoverflow.com/questions/3874627/floating-point-comparison-functions-for-c/3875619#3875619

Philip Starhill
Thanks philip for the answer! updated the question
Vidya Sagar
+2  A: 

A short must be at least 16 bits, and in a whole lot of implementations that's exactly what it is. An unsigned 16-bit short will hold values from 0 to 65535. That means that a short will not hold a full five digits of precision, and certainly not six. If you want six digits, you need 20 bits.

Therefore, any loss of precision is likely due to the fact that you're trying to pack six digits of precision into something less than five digits. There is no solution to this, other than using an integral type that probably takes as much storage as a float.

I don't know why it would seem to work on one given system. Were you using the same numbers on both? Did one use an older floating-point system, and one that coincidentally gave the results you were expecting on the samples you tried? Was it possibly using a larger short than the other?

David Thornley