views:

3060

answers:

6

Imagine the following table (called TestTable):

id     somedate    somevalue
--     --------    ---------
45     01/Jan/09   3
23     08/Jan/09   5
12     02/Feb/09   0
77     14/Feb/09   7
39     20/Feb/09   34
33     02/Mar/09   6

I would like a query that returns a running total in date order, like:

id     somedate    somevalue  runningtotal
--     --------    ---------  ------------
45     01/Jan/09   3          3
23     08/Jan/09   5          8
12     02/Feb/09   0          8
77     14/Feb/09   7          15  
39     20/Feb/09   34         49
33     02/Mar/09   6          55

I know there are various ways of doing this in SQL 2000/2005/2008.

I am particularly interested in this sort of method that uses the aggregating-set-statement trick:

INSERT INTO @AnotherTbl(id, somedate, somevalue, runningtotal) 
SELECT id, somedate, somevalue, null
FROM TestTable
ORDER BY somedate

DECLARE @RunningTotal int
SET @RunningTotal = 0

UPDATE @AnotherTbl
SET @RunningTotal = runningtotal = @RunningTotal + somevalue
FROM @AnotherTbl

... this is very efficient but I have heard there are issues around this because you can't necessarily guarantee that the UPDATE statement will process the rows in the correct order. Maybe we can get some definitive answers about that issue.

But maybe there are other ways that people can suggest?

+1  A: 
SELECT TOP 25   amount, 
 (SELECT SUM(amount) 
 FROM time_detail b 
 WHERE b.time_detail_id <= a.time_detail_id) AS Total FROM time_detail a

You can also use the ROW_NUMBER() function and a temp table to create an arbitrary column to use in the comparison on the inner SELECT statement.

Boo
This is really inefficient ... but then again there is no real clean way of doing this in sql server
Sam Saffron
Absolutely it is inefficient - but it does the job and there's no question of whether something for executed in the right or wrong order.
Boo
thanks, its useful to have alternative answers, and also useful to have efficienty critique
codeulike
A: 

Assuming that windowing works on SQL Server 2008 like it does elsewhere (that I've tried), give this a go:

select testtable.*, sum(somevalue) over(order by somedate)
from testtable
order by somedate;

MSDN says it's available in SQL Server 2008 (and maybe 2005 as well?) but I don't have an instance to hand to try it.

EDIT: well, apparently SQL Server doesn't allow a window specification ("OVER(...)") without specifying "PARTITION BY" (dividing the result up into groups but not aggregating in quite the way GROUP BY does). Annoying-- the MSDN syntax reference suggests that its optional, but I only have SqlServer 2000 instances around at the moment.

The query I gave works in both Oracle 10.2.0.3.0 and PostgreSQL 8.4-beta. So tell MS to catch up ;)

araqnid
Using OVER with SUM will not work in this case to give a running total. The OVER clause does not accept ORDER BY when used with SUM. You have to use PARTITION BY, which will not work for running totals.
Boo
thanks, its actually useful to hear why this wont work. araqnid maybe you could edit your answer to explain why its not an option
codeulike
+3  A: 

The problem is that the SQL Server implementation of the Over clause is somewhat limited.

Oracle (and ANSI-SQL) allow you to do things like:

 SELECT somedate, somevalue,
  SUM(somevalue) OVER(ORDER BY somedate 
     ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) 
          AS RunningTotal
  FROM Table

SQL Server gives you no clean solution to this problem. My gut is telling me that this is one of those rare cases where a cursor is the fastest, though I will have to do some benchmarking on big results.

The update trick is handy but I feel its fairly fragile. It seems that if you are updating a full table then it will proceed in the order of the primary key. So if you set your date as a primary key ascending you will probably be safe. But you are relying on an undocumented SQL Server implementation detail (also if the query ends up being performed by two procs I wonder what will happen, see: MAXDOP):

Full working sample:

drop table #t 
create table #t ( ord int primary key, total int, running_total int)

insert #t(ord,total)  values (2,20)
-- notice the malicious re-ordering 
insert #t(ord,total) values (1,10)
insert #t(ord,total)  values (3,10)
insert #t(ord,total)  values (4,1)

declare @total int 
set @total = 0
update #t set running_total = @total, @total = @total + total 

select * from #t
order by ord 

ord         total       running_total
----------- ----------- -------------
1           10          10
2           20          30
3           10          40
4           1           41

You asked for a benchmark this is the lowdown.

The fastest SAFE way of doing this would be the Cursor, it is an order of magnitude faster than the correlated sub-query of cross-join.

The absolute fastest way is the UPDATE trick. My only concern with it is that I am not certain that under all circumstances the update will proceed in a linear way. There is nothing in the query that explicitly says so.

Bottom line, for production code I would go with the cursor.

Test data:

create table #t ( ord int primary key, total int, running_total int)

set nocount on 
declare @i int
set @i = 0 
while @i < 10000
begin
   insert #t (ord, total) values (@i,  rand() * 100) 
    set @i = @i +1
end

Test 1:

SELECT ord,total, 
    (SELECT SUM(total) 
        FROM #t b 
        WHERE b.ord <= a.ord) AS b 
FROM #t a

-- CPU 11731, Reads 154934, Duration 11135

Test 2:

SELECT a.ord, a.total, SUM(b.total) AS RunningTotal 
FROM #t a CROSS JOIN #t b 
WHERE (b.ord <= a.ord) 
GROUP BY a.ord,a.total 
ORDER BY a.ord

-- CPU 16053, Reads 154935, Duration 4647

Test 3:

DECLARE @TotalTable table(ord int primary key, total int, running_total int)

DECLARE forward_cursor CURSOR FAST_FORWARD 
FOR 
SELECT ord, total
FROM #t 
ORDER BY ord


OPEN forward_cursor 

DECLARE @running_total int, 
    @ord int, 
    @total int
SET @running_total = 0

FETCH NEXT FROM forward_cursor INTO @ord, @total 
WHILE (@@FETCH_STATUS = 0)
BEGIN
     SET @running_total = @running_total + @total
     INSERT @TotalTable VALUES(@ord, @total, @running_total)
     FETCH NEXT FROM forward_cursor INTO @ord, @total 
END

CLOSE forward_cursor
DEALLOCATE forward_cursor

SELECT * FROM @TotalTable

-- CPU 359, Reads 30392, Duration 496

Test 4:

declare @total int 
set @total = 0
update #t set running_total = @total, @total = @total + total 

select * from #t

-- CPU 0, Reads 58, Duration 139
Sam Saffron
Thanks. So your code sample is to demonstrate that it will sum in the order of the primary key, I presume. It would be interesting to know if cursors are still more efficient than joins for larger data sets.
codeulike
I'm sure you can find stats on google or your favorite search engine.. but the quick answer is that set-based operations are much faster than cursor operations when working with large sets of data.
Boo
@sambo99 - Thanks for all the extra detail
codeulike
+1  A: 

The following will produce the required results.

SELECT a.SomeDate, a.SomeValue, SUM(b.SomeValue) AS RunningTotal FROM TestTable a CROSS JOIN TestTable b WHERE (b.SomeDate <= a.SomeDate) GROUP BY a.SomeDate,a.SomeValue ORDER BY a.SomeDate,a.SomeValue

Having a clustered index on SomeDate will greatly improve the performance.

Dave Barker
@Dave I think this question is trying to find an efficient way of doing this, cross joining is going to be really slow for large sets
Sam Saffron
thanks, its useful to have alternative answers, and also useful to have efficienty critique
codeulike
+1  A: 

The APPLY operator in SQL 2005 and higher works for this:

select
    t.id ,
    t.somedate ,
    t.somevalue ,
    rt.runningTotal
from TestTable t
 cross apply (select sum(somevalue) as runningTotal
       from TestTable
       where somedate <= t.somedate
      ) as rt
order by t.somedate
Mike Forman
+1  A: 

You can also denormalize - store running totals in the same table:

http://sqlblog.com/blogs/alexander_kuznetsov/archive/2009/01/23/denormalizing-to-enforce-business-rules-running-totals.aspx

Selects work much faster than any other solutions, but modifications may be slower

AlexKuznetsov