views:

137

answers:

3

I have an sql:

SELECT Sum(Field1), Sum(Field2), Sum(Field1)+Sum(Field2)
FROM Table
GROUP BY DateField
HAVING Sum(Field1)+Sum(Field2)<>0;

Problem is sometimes Sum of field1 and field2 is value like: 9.5-10.3 and the result is -0,800000000000001. Could anybody explain why this happens and how to solve it?

+1  A: 

It is probably an effect of floating point number implementations. Sometimes numbers cannot be exactly represented, and sometimes the result of operations is slightly off what we may expect for the same reason.

The fix would be to use a rounding function on the values to cut off the extraneous digits. Like this (I've simply rounded to 4 significant digits after the decimal, but of course you should use whatever precision is appropriate for your data):

SELECT Sum(Field1), Sum(Field2), Round(Sum(Field1)+Sum(Field2), 4)
FROM Table
GROUP BY DateField
HAVING Round(Sum(Field1)+Sum(Field2), 4)<>0;
qstarin
That's the workaround. The **fix** is to change to decimal as it's unlikely his data is representing statistics or scientific measurements that need a float's peculiar differences over decimal...
Emtucifor
@Emtucifor: that is a good point.
qstarin
You need the rounding on the HAVING clause, too, or you'll include or exclude rows incorrectly.
Emtucifor
wow, that's what I get trying to answer SO questions at midnight on vacation.
qstarin
A: 

I'm certain it is because the float data type (aka Double or Single in MS Access) is inexact. It is not like decimal which is a simple value scaled by a power of 10. If I'm remembering correctly, float values can have different denominators which means that they don't always convert back to base 10 exactly.

The cure is to change Field1 and Field2 from float/single/double to decimal or currency. If you give examples of the smallest and largest values you need to store, including the smallest and largest fractions needed such as 0.0001 or 0.9999, we can possibly advise you better.

Be aware that versions of Access before 2007 can have problems with ORDER BY on decimal values. Please read the comments on this post for some more perspective on this. In many cases, this would not be an issue for people, but in other cases it might be.

In general, float should be used for values that can end up being extremely small or large (smaller or larger than a decimal can hold). You need to understand that float maintains more accurate scale at the cost of some precision. That is, a decimal will overflow or underflow where a float can just keep on going. But the float only has a limited number of significant digits, whereas a decimal's digits are all significant.

If you can't change the column types, then in the meantime you can work around the problem by rounding your final calculation. Don't round until the very last possible moment.

Update

A criticism of my recommendation to use decimal has been leveled, not the point about unexpected ORDER BY results, but that float is overall more accurate with the same number of bits.

No contest to this fact. However, I think it is more common for people to be working with values that are in fact counted or are expected to be expressed in base ten. I see questions over and over in forums about about what's wrong with their floating-point data types, and I don't see these same questions about decimal. That means to me that people should start off with decimal, and when they're ready for the leap to how and when to use float they can study up on it and start using it when they're competent.

In the meantime, while it may be a tad frustrating to have people always recommending decimal when you know it's not as accurate, don't let yourself get divorced from the real world where having more familiar rounding errors at the expense of very slightly reduced accuracy is of value.

Let me point out to my detractors that the example

Decimal(1) / 3 * 3 yielding 1.999999999999999999999999999

is, in what should be familiar words, "correct to 27 significant digits" which is "correct for all practical purposes."

So if we have two ways of doing what is practically speaking the same thing, and both of them can represent numbers very precisely out to a ludicrous number of significant digits, and both require rounding but one of them has markedly more familiar rounding errors than the other, I can't accept that recommending the more familiar one is in any way bad. What is a beginner to make of a system that can perform a - a and not get 0 as an answer? He's going to get confusion, and be stopped in his work while he tries to fathom it. Then he'll go ask for help on a message board, and get told the pat answer "use decimal". Then he'll be just fine for five more years, until he has grown enough to get curious one day and finally studies and really grasps what float is doing and becomes able to use it properly.

That said, in the final analysis I have to say that slamming me for recommending decimal seems just a little bit off in outer space.

Last, I would like to point out that the following statement is not strictly true, since it overgeneralizes:

The float and double types store numbers in base 2, not in base 10.

To be accurate, most modern systems store floating-point data types with a base of 2. But not all! Some use or have used base 10. For all I know, there are systems which use base 3 which is closer to e and thus has a more optimal radix economy than base 2 representations (as if that really mattered to 99.999% of all computer users). Additionally, saying "float and double types" could be a little misleading, since double IS float, but float isn't double. Float is short for floating-point, but Single and Double are float(ing point) subtypes which connote the total precision available. There are also the Single-Extended and Double-Extended floating point data types.

Emtucifor
Currency is a very useful type in Access.
Remou
Yes the field was double type. Now I've converted it to decimal. Thank you.
HasanGursoy
Currency would work, too, since it's just a special case of decimal.
Emtucifor
Decimal still has problems in Access: http://allenbrowne.com/bug-08.html
David-W-Fenton
@David That is interesting... I didn't know that. What do you suggest, going back to float and dealing with rounding? What about currency?
Emtucifor
@David-W-Fenton: "Decimal still has problems in Access" -- that bug was fixed in 2007 version (ACE), was never as bad as described ("Nulls and zeros sort unpredictably" was not the case, etc) and had a simple workaround anyhow (e.g. `ORDER BY (num >= 0), num DESC`). If such an obscure bug can put you off a whole data type, I can show you bugs that will turn you off Access completely ;)
onedaywhen
@one thank you for the clarification.
Emtucifor
@onedaywhen: You're suggesting an ORDER BY that can't use the index is a good workaround for a bug? Whether or not the bugs were "as bad as described" is in the eye of the beholder -- it's essential when you're recommending using a data type that is poorly supported on the recommended platform that the pitfalls be mentioned. Whether it matters to any particular user will depend on their circumstances, but it's not really helpful to leave out mention of the problems, even if you personally consider them insignificant.
David-W-Fenton
Also, there is no decimal data type in VBA. For the very rare cases where I need decimal precision, I make due with currency fields, but it's a vanishly rare issue in the problem space my apps exist in.
David-W-Fenton
@David-W-Fenton: "a vanishly rare issue in the problem space my apps exist in" -- ...as I suspect is the requirement to sort a set of fixed-point decimal values that includes negative values in descending order, yet you are happy to wheel out this 'straw man' argument just because someone mentioned the `DECIMAL` type.
onedaywhen
@David-W-Fenton: "You're suggesting an ORDER BY that can't use the index is a good workaround for a bug?" -- you misquote me: I said "simple". But, yes, sometimes the existence of an index makes no difference. I often find that sorting an ADO recordset is faster than including an `ORDER BY` clause in a query. Like all optimization and performance questions, it depends.
onedaywhen
It depends on whether you need to sort by the column, too. I have an Access database rife with decimal columns and I never sort by them...
Emtucifor
Rather than worry about the hard-to-remember edge cases involving decimal data type, I use something that is dependable. Call me crazy. Not to mention the VBA data type problem, of course.
David-W-Fenton
So, @David, do you use floats instead? Even with their rounding problems? It sounds like you are already doing a workaround. As for the VBA data type problem, the few times I've needed to use decimal I've had no problem with CDec() and Variants.
Emtucifor
When I need decimal precision I use Currency (i.e., 4 decimal places). This has worked fine for the cases in which I've needed it. I will grant that perhaps the apps I've worked with are not numeric intensive in terms of the data being stored, so YMMV as to whether currency works instead of decimal.
David-W-Fenton
@David-W-Fenton: "Rather than worry about the hard-to-remember edge cases..., I use something that is dependable." -- don't all Access data types have 'edge cases' bugs e.g. `YES/NO` (http://allenbrowne.com/bug-14.html), `MEMO` (http://allenbrowne.com/bug-18.html), `NVARCHAR` (http://allenbrowne.com/bug-15.html), ...? My fav quote from Mr Browne's site: "If you use queries, be aware!These can all fail: "PARAMETERS clause, SELECT clause, DISTINCT predicate, FROM clause, WHERE clause, GROUP BY clause, ORDER BY clause" -- all have edge cases so don't use queries?!
onedaywhen
@one good one! lol
Emtucifor
If my reasons for avoiding decimal were limited to the edge case argument, your response would be meaningful. Since there are other reasons it's problematic, your putative coup de grâce is a failure.
David-W-Fenton
@David Btw, please don't miss that I do appreciate you bringing up the flaws with decimal fields in Access.
Emtucifor
-1 for perpetuating the idea that `decimal` is a panacea for floating-point errors.
dan04
@dan04 It isn't a panacea, I agree, and I never claimed that it was. Read my words carefully about how it has a certain applicability. However, for the general ignorant user who isn't doing scientific research, the problems inherent in decimal are less tricky than those in floating point. You can't tell me that you think beginners to this subject are truly going to get along just as fine with float as decimal? If so, I'll have to politely ask what planet you're from. :)
Emtucifor
A planet where many programmers use languages that don't *have* a standard `decimal` type (e.g., C and C++) running on top of hardware with a binary FPU. And thus, where everyone uses `double` unless they *really* need decimal arithmetic. Heck, even Microsoft Excel uses doubles.
dan04
+2  A: 

Problem is sometimes Sum of field1 and field2 is value like: 9.5-10.3 and the result is -0.800000000000001. Could anybody explain why this happens and how to solve it?

Why this happens

The float and double types store numbers in base 2, not in base 10. Sometimes, a number can be exactly represented in a finite number of bits.

9.5 → 1001.1

And sometimes it can't.

10.3 → 1010.0 1001 1001 1001 1001 1001 1001 1001 1001...

In the latter case, the number will get rounded to the closest value that can be represented as a double:

1010.0100110011001100110011001100110011001100110011010 base 2
= 10.300000000000000710542735760100185871124267578125 base 10

When the subtraction is done in binary, you get:

-0.11001100110011001100110011001100110011001100110100000
= -0.800000000000000710542735760100185871124267578125

Output routines will usually hide most of the "noise" digits.

  • Python 3.1 rounds it to -0.8000000000000007
  • SQLite 3.6 rounds it to -0.800000000000001.
  • printf %g rounds it to -0.8.

Note that, even on systems that display the value as -0.8, it's not the same as the best double approximation of -0.8, which is:

- 0.11001100110011001100110011001100110011001100110011010
= -0.8000000000000000444089209850062616169452667236328125

So, in any programming language using double, the expression 9.5 - 10.3 == -0.8 will be false.

The decimal non-solution

With questions like these, the most common answer is "use decimal arithmetic". This does indeed get better output in this particular example. Using Python's decimal.Decimal class:

>>> Decimal('9.5') - Decimal('10.3')
Decimal('-0.8')

However, you'll still have to deal with

>>> Decimal(1) / 3 * 3
Decimal('0.9999999999999999999999999999')
>>> Decimal(2).sqrt() ** 2
Decimal('1.999999999999999999999999999')

These may be more familiar rounding errors than the ones binary numbers have, but that doesn't make them less important.

In fact, binary fractions are more accurate than decimal fractions with the same number of bits, because of a combination of:

It's also much faster (on PCs) because it has dedicated hardware.

There is nothing special about base ten. It's just an arbitrary choice based on the number of fingers we have.

It would be just as accurate to say that a newborn baby weighs 0x7.5 lb (in more familiar terms, 7 lb 5 oz) as to say that it weighs 7.3 lb. (Yes, there's a 0.2 oz difference between the two, but it's within tolerance.) In general, decimal provides no advantage in representing physical measurements.

Money is different

Unlike physical quantities which are measured to a certain level of precision, money is counted and thus an exact quantity. The quirk is that it's counted in multiples of 0.01 instead of multiples of 1 like most other discrete quantities.

If your "10.3" really means $10.30, then you should use a decimal number type to represent the value exactly.

(Unless you're working with historical stock prices from the days when they were in 1/16ths of a dollar, in which case binary is adequate anyway ;-) )

Otherwise, it's just a display issue.

You got an answer correct to 15 significant digits. That's correct for all practical purposes. If you just want to hide the "noise", use the SQL ROUND function.

dan04
So, you're basically saying that decimal and float are essentially the same, and any differences in them are just a display issue... But can anyone read something like [What Every Computer Scientist Should Know About Floating-Point Arithmetic](http://docs.sun.com/source/806-3568/ncg_goldberg.html) and come away thinking that they are in fact essentially the same? Floating point has **all sorts** of issues that aren't in decimal, which instead rounds in a way that is "natural" to people who haven't studied all 33,000 words of a document like that to truly understand it.
Emtucifor
**In general**, people DO expect 9.5 - 10.3 to equal -0.8. And they understand the problems inherent in dividing by numbers that aren't divisible by ten, because they did it in math class and because the numbers are in fact displayed in decimal. Giving a purist answer like you have without consideration of what it means to the type of person likely to need this help is missing the mark, in my opinion. Another issue is that using Round() all over the place is its own kind of noise, one that can interfere with clarity and the facility with which one manipulates and transforms queries.
Emtucifor