ansaurus

Question

Unexpected loss of precision when dividing doubles

Answer 1

+5 A:

Could it be that you use DirectX or OpenGL in your project? If so they can turn off double precision and you will get strange results.

You can check your precision settings with

std::sqrt(x) * std::sqrt(x)

The result has to be pretty close to x. I met this problem long time ago and spend a month checking all the formulas. But then I've found

D3DCREATE_FPU_PRESERVE

Mykola Golubyev 2009-03-30 09:34:38

exactly how do they go about doing that?

anon 2009-03-30 09:36:17

There are options when initialize direct 3d. I don't remember the name but I lost one month checking all the Diploma formulas and only then I made simple check with "sqrt(x)*sqrt(x)" and the precision was broken a lot unless I turned off the option.

Mykola Golubyev 2009-03-30 09:41:24

When compiled with standard Win32 console app in VS2008 it gives the correct answer. I'd agree and say it's a compiler setting.

Binary Worrier 2009-03-30 09:41:28

Answer 2

+3 A:

The problem here is that (c-a) is small, so the rounding errors inherent in floating point operations is magnified in this example. A general solution is to rework your equation so that you're not dividing by a small number, I'm not sure how you would do it here though.

EDIT:

Neil is right in his comment to this question, I computed the answer in VB using Doubles and got the same answer as mathematica.

Patrick McDonald 2009-03-30 09:36:19

see the code I posted - that is not the problem

anon 2009-03-30 09:43:58

Answer 3

+5 A:

The following code:

#include <iostream>
using namespace std;

double getSlope(double a, double b, double c, double d){
    double slope;
    slope=(d-b)/(c-a);
    return slope;
}

int main( ) {
    double s = getSlope(2.71156, -1.64161, 2.70413, -1.72219);
    cout << s << endl;
}

gives a result of 10.8452 with g++. How are you printing out the result in your code?

anon 2009-03-30 09:40:28

It doesn't matter how you print 10.845222072678331, it wont' round or truncate to 10.8557

Pete Kirkham 2009-03-30 13:35:36

Answer 4

A:

I print the result using the command line:

std::cout<<slope<<endl;

It may be that my parameters are maybe not good, as I read them from another program (which computes a graph; after I read this parameters fromt his graph I have just displayed them to see their value but maybe the displayed vectors have not the same internal precision for the calculated value..I do not know it is really strange. Some numerical errors appears..)

When the graph from which I am reading my parameters is computed, some numerical libraries written in C++ (with templates) are used. No OpenGL is used for this computation.

thank you, madalina

madalina 2009-03-30 09:50:29

You may add refinements of your question to your question. It'll be more useful !

Benoît 2009-03-30 10:33:39

Answer 5

+1 A:

Better Print out the arguments, too. When you are, as I guess, transferring parameters in decimal notation, you will lose precision for each and every one of them. The problem being that 1/5 is an infinite series in binary, so e.g. 0.2 becomes .001001001.... Also, decimals are chopped when converting an binary float to a textual representation in decimal.

Next to that, sometimes the compiler chooses speed over precision. This should be a documented compiler switch.

xtofl 2009-03-30 10:01:56

Answer 6

A:

yes this is what I did. I listed the parameters with precision up to 15 digits and so my former parameters where just some approximations.

well..if my parameters are chopped than I am guessing my computations will be approximate for sure. thank you, madalina

madalina 2009-03-30 10:04:40

If yu want to comment on something you should add comments to the corresponding posts. If you want to clarify your question you can edit the question. You should not add answers that are not really answers. Will you please also delete them?

sharptooth 2009-03-30 10:12:59

Answer 7

A:

Patrick seems to be right about (c-a) being the main cause:

d-b = -1,72219 - (-1,64161) = -0,08058

c-a = 2,70413 - 2,71156 = -0,00743

S = (d-b)/(c-a)= -0,08058 / -0,00743 = 10,845222

You start out with six digits precision, through the subtraction you get a reduction to 3 and four digits. My best guess is that you loose additonal precision because the number -0,00743 can not be represented exaclty in a double. Try using intermediate variables with a bigger precision, like this:

double QSweep::getSlope(double a, double b, double c, double d)
{
    double slope;
    long double temp1, temp2;

    temp1 = (d-b);
    temp2 = (c-a);
    slope = temp1/temp2;

    return slope;
}

Treb 2009-03-30 10:06:36

did you look at the code I posted?

anon 2009-03-30 10:09:33

You seem to have confused precision (how the number is represented) with accuracy (what the tolerance on the values are). Whether you specify a double as 2.70413 or 2.7041300000 makes no difference to result in C++

Pete Kirkham 2009-03-30 10:49:56

@Pete Kirkham: It is not possible to represent a value of e.g. 0.1 *exactly* in a double, so storing it in a variable with a bigger scope can give different results.

Treb 2009-03-30 12:22:20

As I show in my answer "You start out with six digits precision, " is irrelevant to the result the code in the OP gives. It is relevant to how many figures of those result you should care about, but (c-a) is not the cause of the error if the result is calculated using 64bit double.

Pete Kirkham 2009-03-30 16:21:19

Answer 8

+7 A:

I've tried with float instead of double and I get 10.845110 as a result. It still looks better than madalina result.

EDIT:

I think I know why you get this results. If you get a, b, c and d parameters from somewhere else and you print it, it gives you rounded values. Then if you put it to Mathemtacia (or calc ;) ) it will give you different result.

I tried changing a little bit one of your parameters. When I did:

double c = 2.7041304;

I get 10.845806. I only add 0.0000004 to c! So I think your "errors" aren't errors. Print a, b, c and d with better precision and then put them to Mathematica.

klew 2009-03-30 10:47:41

Answer 9

+2 A:

The results you are getting are consistent with 32bit arithmetic. Without knowing more about your environment, it's not possible to advise what to do.

Assuming the code shown is what's running, ie you're not converting anything to strings or floats, then there isn't a fix within C++. It's outside of the code you've shown, and depends on the environment.

As Patrick McDonald and Treb brought both up the accuracy of your inputs and the error on a-c, I thought I'd take a look at that. One technique to look at rounding errors is interval arithmetic, which makes the upper and lower bounds which value represents explicit (they are implicit in floating point numbers, and are fixed to the precision of the representation). By treating each value as an upper and lower bound, and by extending the bounds by the error in the representation ( approx x * 2 ^ -53 for a double value x ), you get a result which gives the lower and upper bounds on the accuracy of a value, taking into account worst case precision errors.

For example, if you have a value in the range [1.0, 2.0] and subtract from it a value in the range [0.0, 1.0], then the result must lie in the range [below(0.0),above(2.0)] as the minimum result is 1.0-1.0 and the maximum is 2.0-0.0. below and above are equivalent to floor and ceiling, but for the next representable value rather than for integers.

Using intervals which represent worst-case double rounding:

getSlope(
 a = [2.7115599999999995262:2.7115600000000004144], 
 b = [-1.6416099999999997916:-1.6416100000000002357], 
 c = [2.7041299999999997006:2.7041300000000005888], 
 d = [-1.7221899999999998876:-1.7221900000000003317])
(d-b) = [-0.080580000000000526206:-0.080579999999999665783]
(c-a) = [-0.0074300000000007129439:-0.0074299999999989383218]

to double precision [10.845222072677243474:10.845222072679954195]

So although c-a is small compared to c or a, it is still large compared to double rounding, so if you were using the worst imaginable double precision rounding, then you could trust that value's to be precise to 12 figures - 10.8452220727. You've lost a few figures off double precision, but you're still working to more than your input's significance.

But if the inputs were only accurate to the number significant figures, then rather than being the double value 2.71156 +/- eps, then the input range would be [2.711555,2.711565], so you get the result:

getSlope(
 a = [2.711555:2.711565], 
 b = [-1.641615:-1.641605], 
 c = [2.704125:2.704135], 
 d = [-1.722195:-1.722185])
(d-b) = [-0.08059:-0.08057]
(c-a) = [-0.00744:-0.00742]

to specified accuracy [10.82930108:10.86118598]

which is a much wider range.

But you would have to go out of your way to track the accuracy in the calculations, and the rounding errors inherent in floating point are not significant in this example - it's precise to 12 figures with the worst case double precision rounding.

On the other hand, if your inputs are only known to 6 figures, it doesn't actually matter whether you get 10.8557 or 10.8452. Both are within [10.82930108:10.86118598].

Pete Kirkham 2009-03-30 12:31:31

Answer 10

A:

Okay so this is an old question, but I have the exact same scenario: finding the slope in regions which have arbitrarily close points.

What brought to stackoverflow was the idea of finding a method to rejig the equation. I was hoping to verify that:

float get_slope(float dXa, float dXb, float dYa, float dYb) {
    return (dXa - dXb)/(dYa - dYb);
}

could be improved with:

float get_slope(float dXa, float dXb, float dYa, float dYb) {
    return  dXa/(dYa - dYb) - dXb(dYa - dYb);
}

My questions then are:

Is there a better way to rework the equation?
Should there be multiple equations (paths) in the get_slope() function?

Jamie 2009-05-27 18:25:37

You would be better off posing this as a brand new question. Just give a link to this question in yours, so it isn't closed as a duplicate.

Mark Ransom 2009-05-27 18:35:31

I should clarify - give the link to this question, then state that it didn't answer your own question. Make sure the question is clear!

Mark Ransom 2009-05-27 18:36:52

And finally: your "improved" get_slope will always return 0. I'm sure there's a typo somewhere.

Mark Ransom 2009-05-27 18:38:53

Answer 11

A:

While the academic discussion going on is great for learning about the limitations of programming languages, you may find the simplest solution to the problem is an data structure for arbitrary precision arithmetic.

This will have some overhead, but you should be able to find something with fairly guaranteeable accuracy.

IanGilham 2009-05-27 19:10:22

Recommending arbitrary precision arithmetic, although popular on StackOverflow, is not the best answer to every question about floating-point calculations.

quant_dev 2009-09-08 21:04:40

That is true, but often the simplest workable solution nonetheless. There are often better, faster and more complex ways to do things, but even simplicity alone has a lot of value in a software project.

IanGilham 2009-10-19 10:22:27

ansaurus

tags:

views:

answers:

Unexpected loss of precision when dividing doubles

related questions