ansaurus

Question

Complicated SQL query for a running total column.

Answer 1

+4 A:

The operation is linear in the number of payments for each customer. So, you are going to have to go over each payment, keeping a running total and a high water mark and at the end of all the payments, you will have your answer. Whether you do that in a CLR stored procedure (immediately jumped to mind for me) or use a cursor or temp table or whatever, it's probably not going to be fast.

If you have to run this report over and over again, you should seriously consider keeping a high water mark field and update it (or not) whenever a payment comes in. That way, your report will be trivial -- but this is what data marts are for.

JP Alioto 2009-05-02 03:36:18

JP, thanks for this. I've never tried a CLR stored procedure. My understanding is that in general, for things like this you want to avoid CLR because it just won't be as fast as TSQL optimized. I think we're going to end up storing the precalculated data somewhere.

Linus 2009-05-02 21:10:23

Answer 2

+1 A:

list = list of amounts ordered by date
foreach in list as amount
  running += amount
  if running >= high
    high = running

To keep it fast, you will require a running total incremented with amount on a trigger, and a high value for each customer (can also be updated by a trigger to make the re-query even simpler).

I don't think you can do this type of thing without code (stored procedures are code)

jim 2009-05-02 03:40:36

Lou, I'm actually fine with the idea of using a stored procedure or a function that returns the data. What I sort of wanted to avoid was using cursors in the stored proc. Thanks.

Linus 2009-05-02 21:12:33

Answer 3

+6 A:

Your question seems to be this:

SELECT CustomerID, SUM(Ammount) FROM table WHERE Amount > 0 GROUP BY CustomerID
SELECT CustomerID, SUM(Ammount) FROM table GROUP BY CustomerID

However, I think you mean that you want a table that appears like this

Customer  Payment  HighPoint  RunningTotal
123       5        5          5
123       5        10         10
123       -3       10         7

In which case I would create a view with the two selects above so that the view is something like.

SELECT CusotmerID, 
  PaymentDate, 
  Ammount, 
  (SELECT SUM(Ammount) 
    FROM table as ALIAS 
    WHERE ALIAS.Amount > 0 
      AND ALIAS.PaymentDate <= PaymentDate 
      AND ALIAS.CustomerID = CustomerID), 
  (SELECT SUM(Ammount) 
    FROM table as ALIAS 
    WHERE ALIAS.CustomerID = CustomerID 
    AND ALIAS.PaymentDate <= PaymentDate)
FROM table

Also, you may consider a non-unique index on the Amount column of the table to speed up the view.

jellomonkey 2009-05-02 03:43:31

This doesn't answer the question at all

jim 2009-05-02 03:46:44

how does it not answer the question?

jellomonkey 2009-05-02 03:47:52

wtf... did you edit? PaymentDate wasn't even in your select let alone any clear formatting pre comment.. SO bug?

jim 2009-05-02 03:51:32

I did edit, I copied and pasted and then realized I grabbed the wrong thing.

jellomonkey 2009-05-02 03:52:39

Haha that explains the confusion!

jim 2009-05-03 08:36:18

Answer 4

+1 A:

like Andomar's answer. You can do the running total for each payment. Then find the max peak payment...

with
rt as (
  select
    Payments.*,
    isnull(sum(p.Amount), 0) + Payments.Amount as running
  from
    Payments
    left outer join Payments p on Payments.CustomerID = p.CustomerID
      and p.PaymentDate <= Payments.PaymentDate
      and p.PaymentID < Payments.PaymentID
),
highest as
(
  select
    CustomerID, PaymentID, running as peak_paid
  from
    rt
  where
    PaymentID = (select top 1 rt2.PaymentID 
        from rt rt2 
        where rt2.CustomerID = rt.CustomerID
        order by rt2.running desc, rt2.PaymentDate, rt2.PaymentID)
)

select
  *,
  (select sum(amount) from Payments where Payments.CustomerID = highest.CustomerID) as total_paid  
from
  highest;

however, since you have around 1 million payments, this could be quite slow. Like others are saying, you would want to store the CustomerID, PaymentID and peak_paid in a separate table. This table could be updated on each Payment insert or as a sqljob.

Updated query to use join instead of subqueries. Since the PaymentDate does not have a time, I filter out multiple payments on the same day by the PaymentId.

dotjoe 2009-05-02 04:21:58

dotjoe, that looks pretty good. The report is not run too often, so it maybe worth benchmarking this query. The one issue I see with this query is the calculation of the running total amount does not work if a customer makes more than one payment on a single day (which happens quite often), and we do not store the time component on payment date. Details.. we can figure something out from this.

Linus 2009-05-02 21:17:36

if the PaymentID is auto-incremented maybe you could use that?

dotjoe 2009-05-02 22:12:59

Answer 5

+4 A:

As an alternative to subqueries, you can use a running total query. Here's how I set one up for this case. First create some test data:

create table #payments (
    paymentid int identity,
    customerid int,
    paymentdate datetime,
    amount decimal
)

insert into #payments (customerid,paymentdate,amount) values (1,'2009-01-01',1.00)
insert into #payments (customerid,paymentdate,amount) values (1,'2009-01-02',2.00)
insert into #payments (customerid,paymentdate,amount) values (1,'2009-01-03',-1.00)
insert into #payments (customerid,paymentdate,amount) values (1,'2009-01-04',2.00)
insert into #payments (customerid,paymentdate,amount) values (1,'2009-01-05',-3.00)
insert into #payments (customerid,paymentdate,amount) values (2,'2009-01-01',10.00)
insert into #payments (customerid,paymentdate,amount) values (2,'2009-01-02',-5.00)
insert into #payments (customerid,paymentdate,amount) values (2,'2009-01-03',7.00)

Now you can execute the running total query, which calculates the balance for each customer after each payment:

select cur.customerid, cur.paymentdate, sum(prev.amount)
from #payments cur
inner join #payments prev
    on cur.customerid = prev.customerid
    and cur.paymentdate >= prev.paymentdate
group by cur.customerid, cur.paymentdate

This generates data:

Customer  Paymentdate        Balance after payment
1         2009.01.01         1
1         2009.01.02         3
1         2009.01.03         2
1         2009.01.04         4
1         2009.01.05         1
2         2009.01.01         10
2         2009.01.02         5
2         2009.01.03         12

To look at the maximum, you can do a group by on the running total query:

select customerid, max(balance)
from (
    select cur.customerid, cur.paymentdate, balance = sum(prev.amount)
    from #payments cur
    inner join #payments prev
     on cur.customerid = prev.customerid
     and cur.paymentdate >= prev.paymentdate
    group by cur.customerid, cur.paymentdate
) runningtotal
group by customerid

Which gives:

Customer   Max balance
1          4
2          12

Hope this is useful.

Andomar 2009-05-02 12:45:42

Thank you Andomar. I'll give this a shot.

Linus 2009-05-02 21:22:03

+1 i like this much better than the sub queries

dotjoe 2009-05-02 22:12:49

I like the #payments self-join. I still think MS should support "sum(x) over(order by xx)" though. (Can't that be used here where there is a column to partition on?)

araqnid 2009-05-22 13:08:51

ansaurus

tags:

views:

answers:

Complicated SQL query for a running total column.

related questions