tags:

views:

414

answers:

3

I have a data set consisting of time-stamped values, and absolute (meter) values. Sometimes the meter values reset to zero, which means I have to iterate through and calculate a delta one-by-one, and then add it up to get the total for a given period.

For example:

Timestamp      Value
2009-01-01     100
2009-01-02     105
2009-01-03     120
2009-01-04     0 
2009-01-05     9

the total here is 29, calculated as:

(105 - 100) + (120 - 105) + (0) + (9 - 0) = 29

I'm using MS-SQL server for this, and open to any suggestions.

Right now, I'm using a cursor to do this, which checks that the delta isn't negative, and then totals it up:

DECLARE CURSOR curTest CURSOR FAST_FORWARD FOR
    SELECT value FROM table ORDER BY timestamp
OPEN curTest
DECLARE @delta bigint, @current bigint, @last bigint
SET @delta = 0

FETCH curTest INTO @current
WHILE @@FETCH_STATUS = 0
BEGIN
    IF (@current IS NOT NULL) AND (@current > 0) 
    BEGIN
        IF (@last IS NOT NULL) AND (@current > @last)
            SET @delta = @delta + (@current - @last)
        SET @last = @current

        FETCH curTest INTO @current
    END
END

CLOSE curTest
DEALLOCATE curTest

It would be nice to get a data set like:

Timestamp      Value    LastValue
2009-01-01     100      NULL
2009-01-02     105      100
2009-01-03     120      105
2009-01-04     0        120
2009-01-05     9        0

as then it would be easy to grab the deltas, filter for (Value > LastValue), and do a SUM().

I tried:

SELECT m1.timestamp, m1.value, 
  ( SELECT TOP 1 m2.value FROM table WHERE m2.timestamp < m1.timestamp ORDER BY m2.timestamp DESC ) as LastValue
FROM table

but this actually turns out to be slower than the cursor: When I run these together in SQL studio with 'show execution plan' on, the relative cost of this is 100% (with 7 or 8 operations - the majority in a clustered index scan on timestamp), and the cursor is 0% (with 3 operations).

(What I'm not showing here for simplicity is that I have several different sets of numbers, with a foreign key in this table as well - so there is also always a WHERE clause limiting to a specific set. I have several places where I calculate these totals for a given time period for several sets at once, and thus it becomes quite the performance bottleneck. The non-cursor method can also be easily modified to GROUP BY the key and return all the sets at once - but this actually is even slower in my testing than running the cursor multiple times, because there is the additional overhead of the GROUP BY and SUM() operation, aside from it being slower overall anyways.)

+2  A: 

Start with row_number, then join back to yourself.

with numbered as
(
SELECT value, row_number() over (order by timestamp) as Rownum 
FROM table
)
select sum(n1.value - n2.value)
from numbered n1
  join
  numbered n2  on n1.Rownum = n2.Rownum +1

Actually... you only want to pick up increases... so put a WHERE clause in, saying "WHERE n1.value > n2.value".

And... make sure I've put them the right way around... I've just changed it from -1 to +1, because I think I had it flipped.

Easy!

Rob

Rob Farley
+2  A: 

Much the same...

create table #temp ([timestamp] date,value int);
insert into #temp (timestamp,value) values ('2009-01-01',100)
insert into #temp (timestamp,value) values ('2009-01-02',105)
insert into #temp (timestamp,value) values ('2009-01-03',120)
insert into #temp (timestamp,value) values ('2009-01-04',0)
insert into #temp (timestamp,value) values ('2009-01-05',9);

with numbered as
(
    select ROW_NUMBER() over (order by timestamp) id,value from #temp
)
select sum(n1.value-n2.value) from numbered n1 join numbered n2 on n1.id=n2.id+1 where n1.value!=0

drop table #temp;

Result is 29, as specified.

spender
+1 for the usage of row_number()
Rashmi Pandit
A: 

There are too many unnecessary joins in your algorithm.

Calculating the difference between each meter reading and its subsequent meter reading is a waste of resources. As a real world example, imagine if my electric company read my meter each day to how much electricity I used, and summed daily values to determine my monthly total - it just doesn't make sense. They simply determine the total based on the start value and the end value!

Simply calculate the difference between the first and last readings and adjust to account for the 'resets'. Your formula simply becomes:

total value = (final value) - (initial value) 
                 + (miscellaneous reductions in value, i.e. resets)
total value = (9) - (100) + (120)
            = 29

It's trivial to find the final value and initial value. Just find the total amount by which 'meter' was reduced during 'resets', and add this to the total. Unless there are more reset records than measurement records, this will always be more efficient.

To borrow from spender's solution, the 'reset' value could be calculated by

create table...

select sum(n1.value-n2.value) from numbered n1 join numbered n2 
     on n1.id=n2.id+1 where n1.value=0  //note value=0 rather than value!=0
Kirk Broadhurst