views:

2203

answers:

7

I am trying to have a running average column in the SELECT statement based on a column from the n previous rows in the same SELECT statement. The average I need is based on the n previous rows in the resultset.

Let me explain

Id        Number       Average
 1             1          NULL
 2             3          NULL
 3             2          NULL
 4             4             2 <----- Average of (1, 3, 2),Numbers from previous 3 rows
 5             6             3 <----- Average of (3, 2, 4),Numbers from previous 3 rows
 .             .             .
 .             .             .

The first 3 rows of the Average column are null because there are no previous rows. The row 4 in the Average column shows the average of the Number column from the previous 3 rows.

I need some help trying to construct a SQL Select statement that will do this.

+1  A: 

Check out some solutions here. I'm sure that you could adapt one of them easily enough.

Tom H.
+9  A: 

This should do it:

--Test Data
CREATE TABLE    RowsToAverage
    (
    ID int NOT NULL,
    Number int NOT NULL
    )

INSERT  RowsToAverage(ID, Number)
SELECT  1, 1
UNION ALL
SELECT  2, 3
UNION ALL
SELECT  3, 2
UNION ALL
SELECT  4, 4
UNION ALL
SELECT  5, 6
UNION ALL
SELECT  6, 8
UNION ALL
SELECT  7, 10

--The query
;WITH   NumberedRows
AS
(
SELECT  rta.*, row_number() OVER (ORDER BY rta.ID ASC) AS RowNumber
FROM    RowsToAverage rta
)

SELECT  nr.ID, nr.Number,
     CASE
      WHEN nr.RowNumber <=3 THEN NULL
      ELSE ( SELECT avg(Number) 
        FROM NumberedRows 
        WHERE RowNumber < nr.RowNumber
        AND  RowNumber >= nr.RowNumber - 3
       )
     END AS MovingAverage
FROM    NumberedRows nr
Aaron Alton
+1, it runs correctly on my system
KM
+1- Very elegant solution. MVP, MVP, MVP!
Jose Basilio
A: 

Edit: I missed the point that it should average the three previous records...

For a general running average, I think something like this would work:

SELECT
    id, number, 
    SUM(number) OVER (ORDER BY ID) / 
       ROW_NUMBER() OVER (ORDER BY ID) AS [RunningAverage]
FROM myTable
ORDER BY ID
richardtallent
when using @Aaron Alton's RowsToAverage table (I changed FROM MyTable to FROM RowsToAverage), I get an error: Msg 102, Level 15, State 1, Line 3 Incorrect syntax near 'order'.
KM
Which RDBMS are you using? Windowing functions are only available in SQL 2005 and greater.
Aaron Alton
I should add that the OP mentioned they're using SQL 2008.
Aaron Alton
A: 

If you want this to be truly performant, and arn't afraid to dig into a seldom-used area of SQL Server, you should look into writing a custom aggregate function. SQL Server 2005 and 2008 brought CLR integration to the table, including the ability to write user aggregate functions. A custom running total aggregate would be the most efficient way to calculate a running average like this, by far.

jrista
+4  A: 

Assuming that the Id column is sequential, here's a simplified query for a table named "MyTable":

SELECT 
    b.Id,
    b.Number,
    (
      SELECT 
       AVG(a.Number) 
      FROM 
       MyTable a 
     WHERE 
       a.id >= (b.Id - 3) 
       AND a.id < b.Id
       AND b.Id > 3 
     ) as Average
FROM 
    MyTable b;
ichiban
Uhmm. Assuming sequentials Ids, this works too. +1
Jose Basilio
This can also work if no rows were deleted in the table. I accepted Aaron Alton's solution because of the row_number() OVER (ORDER BY rta.ID ASC) works for all cases.
hpatel298
A: 

Alternatively you can denormalize and store precalculated running values. Described here:

http://sqlblog.com/blogs/alexander_kuznetsov/archive/2009/01/23/denormalizing-to-enforce-business-rules-running-totals.aspx

Performance of selects is as fast as it goes. Of course, modifications are slower.

AlexKuznetsov
+1  A: 

A simple self join would seem to perform much better than a row referencing subquery

Generate 10k rows of test data:

drop table test10k
create table test10k (Id int, Number int, constraint test10k_cpk primary key clustered (id))

;WITH digits AS (
    SELECT 0 as Number
    UNION SELECT 1
    UNION SELECT 2
    UNION SELECT 3
    UNION SELECT 4
    UNION SELECT 5
    UNION SELECT 6
    UNION SELECT 7
    UNION SELECT 8
    UNION SELECT 9
)
,numbers as (
    SELECT 
     (thousands.Number * 1000) 
     + (hundreds.Number * 100) 
     + (tens.Number * 10) 
     + ones.Number AS Number
    FROM digits AS ones 
    CROSS JOIN digits AS tens
    CROSS JOIN digits AS hundreds
    CROSS JOIN digits AS thousands
)
insert test10k (Id, Number)
select Number, Number
from numbers

I would pull the special case of the first 3 rows out of the main query, you can UNION ALL those back in if you really want it in the row set. Self join query:

;WITH   NumberedRows
AS
(
    SELECT  rta.*, row_number() OVER (ORDER BY rta.ID ASC) AS RowNumber
    FROM    test10k rta
)

SELECT  nr.ID, nr.Number,
    avg(trailing.Number) as MovingAverage
FROM    NumberedRows nr
    join NumberedRows as trailing on trailing.RowNumber between nr.RowNumber-3 and nr.RowNumber-1
where nr.Number > 3
group by nr.id, nr.Number

On my machine this takes about 10 seconds, the subquery approach that Aaron Alton demonstrated takes about 45 seconds (after I changed it to reflect my test source table) :

;WITH   NumberedRows
AS
(
    SELECT  rta.*, row_number() OVER (ORDER BY rta.ID ASC) AS RowNumber
    FROM    test10k rta
)
SELECT  nr.ID, nr.Number,
    CASE
            WHEN nr.RowNumber <=3 THEN NULL
            ELSE (  SELECT  avg(Number) 
                            FROM    NumberedRows 
                            WHERE   RowNumber < nr.RowNumber
                            AND             RowNumber >= nr.RowNumber - 3
                    )
    END AS MovingAverage
FROM    NumberedRows nr

If you do a SET STATISTICS PROFILE ON, you can see the self join has 10k executes on the table spool. The subquery has 10k executes on the filter, aggregate, and other steps.