views:

355

answers:

2

I am aware of how floating point precision works in the regular cases, but I stumbled on an odd situation in my C# code.

Why aren't result1 and result2 the exact same floating point value here?


const float A;   // Arbitrary value
const float B;   // Arbitrary value

float result1 = (A*B)*dt;

float result2 = (A*B); 
result2 *= dt;

From this page I figured float arithmetic was left-associative and that this means values are evaluated and calculated in a left-to-right manner.

The full source code involves XNA's Quaternions. I don't think it's relevant what my constants are and what the VectorHelper.AddPitchRollYaw() does. The test passes just fine if I calculate the delta pitch/roll/yaw angles in the same manner, but as the code is below it does not pass:


X
  Expected: 0.275153548f
  But was:  0.275153786f

[TestFixture]
    internal class QuaternionPrecisionTest
    {
        [Test]
        public void Test()
        {
            JoystickInput input;
            input.Pitch = 0.312312432f;
            input.Roll = 0.512312432f;
            input.Yaw = 0.912312432f;
            const float dt = 0.017001f;

            float pitchRate = input.Pitch * PhysicsConstants.MaxPitchRate;
            float rollRate = input.Roll * PhysicsConstants.MaxRollRate;
            float yawRate = input.Yaw * PhysicsConstants.MaxYawRate;

            Quaternion orient1 = Quaternion.Identity;
            Quaternion orient2 = Quaternion.Identity;

            for (int i = 0; i < 10000; i++)
            {
                float deltaPitch = 
                      (input.Pitch * PhysicsConstants.MaxPitchRate) * dt;
                float deltaRoll = 
                      (input.Roll * PhysicsConstants.MaxRollRate) * dt;
                float deltaYaw = 
                      (input.Yaw * PhysicsConstants.MaxYawRate) * dt;

                // Add deltas of pitch, roll and yaw to the rotation matrix
                orient1 = VectorHelper.AddPitchRollYaw(
                                orient1, deltaPitch, deltaRoll, deltaYaw);

                deltaPitch = pitchRate * dt;
                deltaRoll = rollRate * dt;
                deltaYaw = yawRate * dt;
                orient2 = VectorHelper.AddPitchRollYaw(
                                orient2, deltaPitch, deltaRoll, deltaYaw);
            }

            Assert.AreEqual(orient1.X, orient2.X, "X");
            Assert.AreEqual(orient1.Y, orient2.Y, "Y");
            Assert.AreEqual(orient1.Z, orient2.Z, "Z");
            Assert.AreEqual(orient1.W, orient2.W, "W");
        }
    }

Granted, the error is small and only presents itself after a large number of iterations, but it has caused me some great headackes.

+6  A: 

I couldn't find a reference to back this up but I think it is due to the following:

  • float operations are calculated in the precision available in the hardware, that means they can be done with a greater precision than that of float.
  • the assignment to the intermediate result2 variable forces rounding back to float precision, but the single expression for rsult1 is computed entirely in native precision before being rounded down.

On a side note, testing float or double with == is always dangerous. The Microsoft Unit testing provides for am Assert.AreEqual(float expected, float actual,float delta) where you would solve this problem with a suitable delta (based on float.Epsilon).

Henk Holterman
Interesting, a friend suggested the idea that there might be a hardware instruction that takes two float multiplications directly for enhanced precision and that dividing into two statements prevents the compiler from using this special instruction?
Andreas Larsen
@Andreas, AFAIK there is no x86 instruction to do that.
Henk Holterman
+1. I would add that the native precision of x86 processors is 80bit double, the native precision of x64 is 64bit double. You get the most accurate results by allowing the compiler to keep partial calculations in registers. Telling the compiler to convert partial products back to float _will_ cause loss of precision.
John Knoeller
@Holterman, the == comparison was intentional to make sure the floats were 100% identical. I want zero deviation.@Knoeller, very interesting. I did not know there was a internal precision of processors. If this is true then that explains the behavior I am getting.Am I understanding it correctly that result1 is calculated by 64-bit precision, while this precision is lost when the preliminary value of result2 is stored to a 32-bit float?
Andreas Larsen
I am compiling towards x86 on an Intel x64 processor on Windows 7 64-bit.
Andreas Larsen
@Andreas: 100% here would be the precision of `float`. Highly unlikely that your problem domain requires exactly that.
Henk Holterman
@Holterman, actually I believe I do. I am writing a simulator for developing autopilot software. I have a physics simulation of the world that calculates the position and rotation of a vehicle and I have virtual sensors that measure these more or less accurately in order to estimate position and rotation. I need to know that the estimated state is perfectly equal to the simulated state as long as I don't add noise/inaccuracy to my sensors. After about 30 seconds of flying, the error started accumulating and it all boiled down to this floating point precision problem.
Andreas Larsen
The physics simulation performed result1 while the estimator performed result2 and so they slowly drifted apart after many iterations.
Andreas Larsen
@Andreas: in that case you are actually using the full precision of `float` and should use delta=0 (==). But maybe you should consider using `double` instead. Note that it will be at least as fast.
Henk Holterman
Use double instead and it will take 4 billion times longer for the drift to be noticeable. As for perfect accuracy: try dividing 10 by 3, then multiplying by 3.
Hans Passant
Thanks for the tips, but double won't work here. The precision (float or double) is not the real issue, but less precision do reveal the bug more quickly. It is a long story, but to sum it up my simulator and my state estimator perform the same calculation and with identical inputs, so in theory they should give the same result every time as long as their precision is equal. The problem was that they don't share code (for a legit reason) and although they were semantically equal I got this precision error that was my core question.
Andreas Larsen
@Andreas: Not just "as long as their precision is equal". As you found out, you have to do every calculation in exactly the same way (same order). Forget all your commutative/associative rules, they don't apply here.
Henk Holterman
@Holterman, yes :-) Precisely!Thanks guys for all the insight!
Andreas Larsen
Couple comments. First, "based on float epsilon" doesn't make sense. Epsilon is the smallest possible value that can be represented by a float; that value has practically nothing whatsoever to do with the magnitude of the error induced by rounding. The correct amount to use for the allowable delta is to base the delta upon the *measurement error in the physical quantities represented*. If you say that the length is 76.23 metres and the height is 2.44 metres then the area should be computed with a tolerance of the precision of the measurement: around 0.005.
Eric Lippert
Second, the assignment back to the local *probably* causes the rounding. However, other things can cause rounding, and the jitter is free to enregister the local, which would prevent rounding. All you can really say here is that this is the most likely cause of the rounding; you'd have to look at the precise code generated by the jitter to be sure.
Eric Lippert
+7  A: 

Henk is exactly right. Just to add a bit to that.

What's happening here is that if the compiler generates code that keeps the floating point operations "on the chip" then they can be done in higher precision. If the compiler generates code that moves the results back to the stack every so often, then every time they do so, the extra precsion is lost.

Whether the compiler chooses to generate the higher-precision code or not depends on all kinds of unspecified details: whether you compiled debug or retail, whether you are running in a debugger or not, whether the floats are in variables or constants, what chip architecture the particular machine has, and so on.

Basically, you are guaranteed 32 bit precision OR BETTER, but you can NEVER predict whether you're going to get better than 32 bit precision or not. Therefore you are required to NOT rely upon having exactly 32 bit precision, because that is not a guarantee we give you. Sometimes we're going to do better, and sometimes not, and if you sometimes get better results for free, don't complain about it.

Henk said that he could not find a reference on this. It is section 4.1.6 of the C# specification, which states:

Floating-point operations may be performed with higher precision than the result type of the operation. For example, some hardware architectures support an “extended” or “long double” floating-point type with greater range and precision than the double type, and implicitly perform all floating-point operations using this higher precision type. Only at excessive cost in performance can such hardware architectures be made to perform floating-point operations with less precision, and rather than require an implementation to forfeit both performance and precision, C# allows a higher precision type to be used for all floating-point operations. Other than delivering more precise results, this rarely has any measurable effects.

As for what you should do: First, always use doubles. There is no reason whatsoever to use floats for arithmetic. Use floats for storage if you want; if you have a million of them and want to use four million bytes instead of eight million bytes, that's a reasonable usage for floats. But floats COST you at runtime because the chip is optimized to do 64 bit math, not 32 bit math.

Second, do not rely upon floating point results being exact or reproducible. Small changes in conditions can cause small changes in results.

Eric Lippert
Very nice and thorough follow-up. So I understand now that floating point arithmetic is generally not deterministic, but can I rely on the exact same calculation (and the same input) to be "deterministic" in my program? That is, is it possible/likely for the precision to change between calculations? The reason I am using float in the first place is because I am using the XNA game studio framework, which is float territory. Probably because of Xbox 360 and Zune using the same framework.
Andreas Larsen
Re: floats on XBOX: OK, that makes sense. I didn't realize their libraries used only floats. An odd choice. Re: can the precision of a specific calculation change between invocations? In theory, yes; the jitter is allowed to say "hey, based on data I've gathered from previous runs, I see that I can make more speed and precision if I regenerate the code like this..." In practice, the jitter does not actually do that. (To my knowledge; I am not an expert on the jitter.)
Eric Lippert