views:

343

answers:

3

Suppose I have a SQL table of Awards, with fields for Date and Amount. I need to generate a table with a sequence of consecutive dates, the amount awarded in each day, and the running (cumulative) total.

Date         Amount_Total   Amount_RunningTotal
----------   ------------   -------------------
1/1/2010              100                   100
1/2/2010              300                   400
1/3/2010                0                   400
1/4/2010                0                   400
1/5/2010              400                   800
1/6/2010              100                   900
1/7/2010              500                  1400
1/8/2010              300                  1700

This SQL works, but isn't as quick as I'd like:

Declare @StartDate datetime, @EndDate datetime 
Select @StartDate=Min(Date), @EndDate=Max(Date) from Awards 

; With 

/* Returns consecutive from numbers 1 through the 
number of days for which we have data */
Nbrs(n) as (
   Select 1 Union All 
   Select 1+n 
   From Nbrs 
   Where n<=DateDiff(d,@StartDate,@EndDate)),

/* Returns all dates @StartDate to @EndDate */
AllDays as (
   Select Date=DateAdd(d, n, @StartDate) 
   From Nbrs ) 

/* Returns totals for each day */
Select 
 d.Date,
 Amount_Total = (
        Select Sum(a.Amount) 
        From Awards a 
        Where a.Date=d.Date),
 Amount_RunningTotal = (
        Select Sum(a.Amount) 
        From Awards a 
        Where a.Date<=d.Date)
From AllDays d
Order by d.Date 
Option(MAXRECURSION 1000)

I tried adding an index to Awards.Date, but it made a very minimal difference.

Before I resort to other strategies like caching, is there a more efficient way to code the running total calculation?

+2  A: 

I generally use a temporary table for this:

DECLARE @Temp TABLE
(
    [Date] date PRIMARY KEY,
    Amount int NOT NULL,
    RunningTotal int NULL
)

INSERT @Temp ([Date], Amount)
    SELECT [Date], Amount
    FROM ...

DECLARE @RunningTotal int

UPDATE @Temp
SET @RunningTotal = RunningTotal = @RunningTotal + Amount

SELECT * FROM @Temp

If you can't make the date column a primary key then you need to include an ORDER BY [Date] in the INSERT statement.

Also, this question's been asked a few times before. See here or search for "sql running total". The solution I posted is, as far as I know, still the one with the best performance, and also easy to write.

Aaronaught
Thanks very much - that's perfect. Amazing how fast that is.
Herb Caudill
I have never seen syntax like "SET @x = a = @x + b" before. For what RDBMSes does this work?
Dems
@Dems: This syntax is for SQL Server, but you can do something very similar in mysql (refer to the link near the bottom of my answer - where somebody posted a mysql version).
Aaronaught
A: 

I don't have a database setup in front of me so I hope the below works first shot. A pattern like this should result in a much speedier query...you're just joining twice, similar amount of aggregation:

Declare @StartDate datetime, @EndDate datetime 
Select @StartDate=Min(Date), @EndDate=Max(Date) from Awards 
; 
WITH AllDays(Date) AS (SELECT @StartDate UNION ALL SELECT DATEADD(d, 1, Date) 
                       FROM AllDays 
                       WHERE Date < @EndDate)

SELECT d.Date, sum(day.Amount) Amount_Total, sum(running.Amount) Amount_RunningTotal
FROM AllDays d  
     LEFT JOIN (SELECT date, SUM(Amount) As Amount
                FROM Awards 
                GROUP BY Date) day
          ON d.Date = day.Date
     LEFT JOIN (SELECT date, SUM(Amount) As Amount
                FROM Awards 
                GROUP BY Date) running 
                ON (d.Date >= running.Date)
Group by d.Date
Order by d.Date 

Note: I changed your table expression up top, it was leaving out the first day before...if this is intentional just slap a where clause on this to exclude it. Let me know in the comments if this doesn't work or doesn't fit and I'll make whatever adjustments.

Nick Craver
Good catch on the first-day thing - made the same change shortly after posting. I'd actually tried changes along the lines you propose here before (using group by instead of an inline query, and a one CTE instead of two), and surprisingly they didn't have much of a performance impact. The technique @Aaronaught posted is about 4x faster than anything else I've tried - I think mainly because of the date primary key on the temp table.
Herb Caudill
A: 

Here's a working solution based on @Aaronaught's answer. The only gotcha I had to overcome in T-SQL was that @RunningTotal etc. can't be null (need to be converted to zero).

Declare @StartDate datetime, @EndDate datetime 
Select @StartDate=Min(StartDate),@EndDate=Max(StartDate) from Awards

/* @AllDays: Contains one row per date from @StartDate to @EndDate */
Declare @AllDays Table (
    Date datetime Primary Key)
; With 
Nbrs(n) as (
    Select 0 Union All 
    Select 1+n from Nbrs 
    Where n<=DateDiff(d,@StartDate,@EndDate) 
    )
Insert into @AllDays
Select Date=DateAdd(d, n, @StartDate) 
From Nbrs
Option(MAXRECURSION 10000) /* Will explode if working with more than 10000 days (~27 years) */

/* @AmountsByDate: Contains one row per date for which we have an Award, along with the totals for that date */ 
Declare @AmountsByDate Table (
    Date datetime Primary Key,
    Amount money)
Insert into @AmountsByDate
Select 
    StartDate, 
    Amount=Sum(Amount) 
from Awards a
Group by StartDate

/* @Result: Joins @AllDays and @AmountsByDate etc. to provide totals and running totals for every day of the award */
Declare @Result Table (
    Date datetime Primary Key,
    Amount money,
    RunningTotal money)
Insert into @Result 
Select 
    d.Date,
    IsNull(bt.Amount,0),
    RunningTotal=0
from @AllDays d
Left Join @AmountsByDate bt on d.Date=bt.Date
Order by d.Date

Declare @RunningTotal money Set @RunningTotal=0
Update @Result Set @RunningTotal = RunningTotal = @RunningTotal + Amount

Select * from @Result 
Herb Caudill