ansaurus

Question

Answer 1

+3 A:

SELECT t1.FileName, t1.CreatedDate, t2.CreatedDate as PrevCreatedDate
FROM 
   (SELECT FileName, CreateDate,
          ROW_NUMBER() OVER(PARTITION BY FileName ORDER BY CreatedDate) AS OrderNo
   FROM MyTable) t1
LEFT JOIN
   (SELECT FileName, CreateDate,
     ROW_NUMBER() OVER(PARTITION BY FileName ORDER BY CreatedDate) AS OrderNo
     FROM MyTable) t2
ON (t1.FileName = t2.FileName AND t1.OrderNo = t2.OrderNo - 1)

Or may be better use 'WITH', because queries is identical:

WITH t(ObjectID, FileName, CreatedDate, OrderNo) AS
   (SELECT ObjectID, FileName, CreatedDate,
          ROW_NUMBER() OVER(PARTITION BY FileName ORDER BY CreatedDate) AS OrderNo
   FROM MyTable) 
SELECT t1.ObjectID, t1.FileName, t1.CreatedDate, t2.CreatedDate AS PrevCreatedDate,
        DATEDIFF("SS", '1900-01-01 00:00:00', 
           COALESCE((t1.CreatedDate - t2.CreatedDate),0)) AS secondsTaken
FROM t t1 LEFT JOIN t t2 
ON (t1.FileName = t2.FileName AND t1.OrderNo = t2.OrderNo + 1)

Michael Pakhantsov 2010-08-06 14:28:00

Won't this only work if the files are entered sequentially?

JNK 2010-08-06 14:30:52

Worth trying this and looking at the plans but I'm not sure this will be any more efficient than what the OP already has.

Martin Smith 2010-08-06 14:32:36

@Martin, unfortunately mssql have only several windowed functions. in oracle same query will look more elegantly: SELECT FileName, CreateDate, LAG(CreatedDate, 1, NULL) OVER(PARTITION BY FileName ORDER BY CreatedDate) AS PrevCreatedDate FROM MyTable. But LAG does not exists in mssql

Michael Pakhantsov 2010-08-06 14:45:31

@Michael - Plans here http://img686.imageshack.us/img686/3167/plansgn.jpg

Martin Smith 2010-08-06 14:48:49

... though when I added a composite index on `FileName, CreatedDate` they both came out at the same cost. I'll do a bit more testing.

Martin Smith 2010-08-06 15:14:10

@Martin Smith - I would imagine the amount of rows to process using the CROSS APPLY would quickly skyrocket. With given ROW_NUMBER solution and proper indexing, the amount of rows processed would roughly be n*2. For larger tables, I would think the ROW_NUMBER solution performs much better. OP Should try it (and post a query plan).

Lieven 2010-08-06 15:17:57

@Martin, thanks. Do you have plan for original query?

Michael Pakhantsov 2010-08-06 15:19:14

@Michael - That was the first one in that image. When I added the composite index on `FileName, CreatedDate` and a few thousands rows of test data though yours looks like it might be the slight winner though it's very close http://img715.imageshack.us/img715/3774/plansb.jpg

Martin Smith 2010-08-06 15:31:44

@Martin, thanks!

Michael Pakhantsov 2010-08-06 15:46:36

@Lieven - Sorry I'm coming to realise that for some reason the figure that SQL Server reports as the "query cost (relative to the batch)" for these `row_number` queries is orders of magnitude different from the actual cost. For some reason it seems unable to realise that the join on rownum=rownum+1 will yield one row and this puts its cost estimates way off the actual. When I look at actual reads, writes, and cpu your's and Michael's are about the same and both massively better than the OPs original query.

Martin Smith 2010-08-06 16:39:44

@Michael - I just made a couple of minor edits that I noticed when I was testing the two queries (I think the `-` should have been a `+` and I added in the additional column). Hope that's OK.

Martin Smith 2010-08-06 18:16:58

This works great... Thanks

The King 2010-08-09 14:29:32

Answer 2

+2 A:

I think Michael's answer should indeed prove more efficient. When evaluating efficiency though I just want to draw attention to an issue that caught me out and has probably in fact led me to incorrectly discounting row_number as an option in the past for this type of "previous row" query.

I set up a test table with 23,174 rows and ran the query in the question and Michael's. Looking at the "query cost (relative to the batch)" in the actual execution plan the original query was 0% and Michael's 100% cost and so appears to be massively inefficient.

However the actual stats tell a completely different story

Cross Apply Approach

SQL Server Execution Times: CPU time = 0 ms, elapsed time = 0 ms.
Table 'MyTable'. Scan count 23175, logical reads 49335, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

ROW_NUMBER approach

SQL Server Execution Times: CPU time = 391 ms, elapsed time = 417 ms.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'MyTable'. Scan count 2, logical reads 148, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

In the ROW_NUMBER plan the Merge Join on rownumber=rownumber+1 has 23,174 rows going in to the top and bottom and actual rows out is 23,174 as well. For Some reason SQL Server estimates that the rows out will be 34,812,000 though and I think that is the reason that its reported cost figure is wildly inaccurate.

Test Script

BEGIN TRAN

CREATE TABLE MyTable(
    [ObjectID] [INT] IDENTITY(1,1) NOT NULL PRIMARY KEY CLUSTERED,
    [FILENAME] [VARCHAR](50) NULL,
    [CreatedDate] [DATETIME] NULL)  GO

INSERT INTO MyTable
SELECT ISNULL(TYPE,NEWID())     ,
DATEADD(DAY, CAST(RAND(CAST(NEWID() AS VARBINARY))*10000 AS INT),GETDATE()) 
FROM master.dbo.spt_values
GO 10

DELETE FROM MyTable
   WHERE EXISTS(SELECT * FROM MyTable m2 WHERE MyTable.CreatedDate = m2.CreatedDate AND MyTable.FILENAME = M2.FILENAME AND MyTable.ObjectID < M2.ObjectID)

CREATE UNIQUE NONCLUSTERED  INDEX [IX_MyTable] ON MyTable ([FILENAME] ASC,[CreatedDate] ASC)


SET STATISTICS IO ON
SET STATISTICS TIME ON

    SELECT  A.ObjectID, 
        A.FileName,
        A.CreatedDate AS CreatedDate, 
        B.PrevRowCreatedDate,
        DATEDIFF("SS", '1900-01-01 00:00:00', COALESCE((A.CreatedDate - B.PrevRowCreatedDate),0)) AS secondsTaken
    INTO #A
    FROM MyTable AS A 
        CROSS APPLY (       
        (SELECT PrevRowCreatedDate = MAX(CreatedDate) FROM MyTable AS BB 
                        WHERE   BB.FileName = A.FileName AND 
                                BB.CreatedDate < A.CreatedDate
        )
        ) AS B  
;

WITH t(ObjectID, FileName, CreatedDate, OrderNo) AS
   (SELECT ObjectID, FileName, CreatedDate,
          RANK() OVER(PARTITION BY FileName ORDER BY CreatedDate) AS OrderNo
   FROM MyTable) 
SELECT t1.ObjectID, t1.FileName, t1.CreatedDate, t2.CreatedDate AS PrevCreatedDate,
        DATEDIFF("SS", '1900-01-01 00:00:00', COALESCE((t1.CreatedDate - t2.CreatedDate),0)) AS secondsTaken
INTO #B
FROM t t1 LEFT JOIN t t2 
ON (t1.FileName = t2.FileName AND t1.OrderNo = t2.OrderNo + 1)


/*Test the 2 queries give the same result*/
SELECT * FROM #A
EXCEPT
SELECT * FROM #B

SELECT * FROM #B
EXCEPT
SELECT * FROM #A

ROLLBACK

Martin Smith 2010-08-06 17:50:26

+1 @Martin, thanks a lot for whole analysis.

Michael Pakhantsov 2010-08-07 07:22:11

ansaurus

tags:

views:

answers:

Sql Query Pervious Row Optimisation

Cross Apply Approach

ROW_NUMBER approach

Test Script

related questions