ansaurus

Question

erroneous Visual C float / double conversion?

Answer 1

+3 A:

This isn't an issue with VC++ or anything like that - it's a fundamental issue with how floating point numbers are stored on the computer. For more information, see IEEE-754.

The issue is that a conversion from float to double is done such that converting back from double to float results in exactly the same float value that you started with. I'm not aware of any way around the loss of precision, except to use only doubles when you need the longer precision. It may be that trying to round the converted float to two decimal places will set it to the correct value, but I'm not sure of that.

Daniel G 2010-04-07 09:08:06

Answer 2

+2 A:

There is nothing wrong with what is happening here.

Because of the way floating point numbers are represented in memory, 42.479999999999997 is the closest representation of 42.48 that a double can have.

Read this paper: http://docs.sun.com/source/806-3568/ncg_goldberg.html

It explains what's happening there. There is unfortunately nothing you can do about the storage of it.

Salgar 2010-04-07 09:08:47

+1 for the link to "What Every Computer Scientist Should Know About Floating-Point Arithmetic"

Paul R 2010-04-07 09:25:25

Answer 3

+2 A:

The value in f1 and the value in d2 both represent the exact same number. That number is not exactly 42.480000, neither is it exactly 42.479999542236328, although it does have a decimal representation which terminates. When displaying floats, your debug view is sensibly rounding at the precision of a float, and when displaying doubles it's rounding at the precision of a double. So you see about twice as many significant figures of the mystery value when you convert and display as a double.

d1 contains a better approximation to 4.48 than the mystery value, since d1 contains the closest double to 4.48, whereas f1 and d2 only contain the closest float value to 4.48. What did you expect d2 to contain? f1 can't "remember" that it's "really supposed to be" 4.48, so that when it converts to double it gets "more accurate".

The way to avoid it depends which serious numerical problems you mean. If the problem is that d1 and d2 don't compare equal, and you think they should, then the answer is to include a small tolerance in your comparisons, for example, replace d1 == d2 with:

fabs(d1 - d2) <= (d2 * FLT_EPSILON)

That is just an example, though, I haven't checked whether it deals with this case. You have to pick a tolerance that works for you, and you might also have to worry that d2 might be zero.

If the problem is that d2 is not a sufficiently accurate value for your algorithm to produce accurate results, then you have to avoid float values, and/or use a more numerically stable algorithm.

Steve Jessop 2010-04-07 10:59:02

ansaurus

tags:

views:

answers:

erroneous Visual C float / double conversion?

related questions