views:

257

answers:

3

I've got a table defined like this :

create table #tbFoo
(bar float)

And I'm looking for a way to round every value contained in column bar without changing the total sum (which is known to be an integer, or very close to an integer because of float number precision).

Rounding every value to the nearest integer won't work (ex : 1,5;1,5 will be rounded to 1;1 or 2;2)

It's quite easy to do this using several requests (eg storing the original sum, rounding, computing the new sum, and updating as many rows as needed to go back to the original sum), but this is not a very elegant solution.

Is there a way to do this using a single SQL request?


I'm using SQL Server 2008, so solutions taking advantage of this specific vendor are welcome.


Edit : I'm looking for a request minimizing the differences between the old values and the new ones. In other words, a value should never be rounded up if a greater value has been rounded down, and vice-versa

+3  A: 

Update:

See this solution explained in more details in the article in my blog:


You need to keep cumulative offset for each value:

1.2   (1 + 0.0)  ~ 1    1   1.2   +0.2
1.2   (1 + 0.2)  ~ 1    2   2.4   +0.4
1.2   (1 + 0.4)  ~ 1    3   3.6   +0.6
1.2   (1 + 0.6)  ~ 2    5   4.8   -0.2
1.2   (1 - 0.2)  ~ 1    6   6.0   0.0

This is easily done in MySQL, but in SQL Server you will have to write a cursor or use cumulative subselects (which are less efficient).

Update:

The query below selects the difference between the sums of the values and of those rounded down to the nearest smaller integer.

This gives us the number (N) of values we should round up.

Then we order the values by their fractional part (ones that are closer to their ceiling go first) and round the first N up, the others down.

SELECT  value,
        FLOOR(value) + CASE WHEN ROW_NUMBER() OVER (ORDER BY value - FLOOR(value) DESC) <= cs THEN 1 ELSE 0 END AS nvalue
FROM    (
        SELECT  cs, value
        FROM    (
                SELECT  SUM(value) - SUM(FLOOR(value)) AS cs
                FROM    @mytable
                ) c
        CROSS JOIN
                @mytable
        ) q

Here's the script for the test data:

SET NOCOUNT ON
GO
SELECT  RAND(0.20090917)
DECLARE @mytable TABLE (value FLOAT NOT NULL)
DECLARE @cnt INT;
SET @cnt = 0;
WHILE @cnt < 100
BEGIN
        INSERT
        INTO    @mytable
        VALUES  (FLOOR(RAND() * 100) / 10)
        SET @cnt = @cnt + 1
END

INSERT
INTO    @mytable
SELECT  600 - SUM(value)
FROM    @mytable
Quassnoi
@Quassnoi : thanks for your answer. With your solution, 1.4/1.4/1.2 would be rounded to 1/1/2 which isn't 'fair' since 1.4 is greater than 1.2.
Brann
@Quassnoi : that's exactly the kind of thing I was looking for. Thanks !
Brann
@Quassnoi : in my realworld problem, I've got another column 'ListID', and I want the sum per ListID to remain constant. I've adapted your request using group by clauses, and it almost works, BUT the ROW_NUMBER applies to my whole request, instead of the grouped by request. Is there something I can do?
Brann
@Brann: could you please post some sample data and the desired resultset?
Quassnoi
`@Brann`: meanwhile, try to replace `OVER (ORDER BY value)` with `OVER (PARTITTION BY listID ORDER BY value)`. This will return you the row numbers groupwise.
Quassnoi
@Quassnoi : that's what I was looking for ; thanks !
Brann
A: 

First get the difference between the rounded sum and the actual sum, and the number of records:

declare @Sum float, @RoundedSum float, @Cnt int

select @Sum = sum(bar), @RoundedSum = sum(round(bar)), @Cnt = count(*)
from #tbFoo

Then you spread the difference equally on all values before rounding:

declare @Offset float

set @Offset = (@Sum - @RoundedSum) / @Cnt

select bar = round(bar + @Offset)
from #tbFoo
Guffa
@guffa : if I'm not mistaking : 0.2/0.2/0.2/0.2/0.2 , sum=1, roundedsum=0, offest=0.2, finalresulst = 0/0/0/0/0, finalsum=0.
Brann
@Brann: Yes, you are right, this doesn't give the right result. However, the principle of spreading the deviation equally is sound. I'll give it some more though...
Guffa
+1  A: 

If you have a list of n values whose elements are accurate only to within an integer value (+-0.5), then any sum of those elements will have a cumulative error or +-(n*0.5). If you have 6 elements in your list which should add up to some number, then your worst case scenario is that you're off by 3 if you just add the integer values.

If you find some way of showing 10.2 as 11 in order to make the sum work, you've changed the precision of that element from +-0.5 to +-0.8, which is counterintuitive when looking at integers?

One possible solution to think about is to round your number during display only (using some format string on your output), not already at the retrieval stage. Each number will be as close as possible to the actual value, but the sum will be more correct too.

Example: If you have 3 values of 1/3 each, displayed as whole-numbered percentages, then you should be showing 33, 33 and 33. To do anything else is to create a margin of error greater than +-0.5 for any individual value. Your total should still be displayed as 100%, because that is the best possible value (as opposed to working with sums of already rounded values)

Also, be aware that by using a float, you've already introduced a limitation on your precision because you have no way of accurately representing 0.1. For more on that, read What Every Computer Scientist Should Know About Floating-Point Arithmetic

Galghamon
@Galghamon: Yes, showing 10.2 as 11 is exactly what I'm trying to achieve. And that's what the user wants to see, too.
Brann