views:

68

answers:

2

Here is my table structure

MyTable
-----------

ObjectID int (Identity),           -- Primary Key
FileName varchar(10),
CreatedDate datetime
...........
...........
...........

I need to get the time taken to create record in a file... ie... Time elapsed between the previous record in the same file and the current record of the same file

ie... If the records are

ObjectID    FileName    CreatedDate (just showing the time part here)
--------    --------    -----------
1           ABC         10:23
2           ABC         10:25
3           DEF         10:26
4           ABC         10:30
5           DEF         10:31
6           DEF         10:35

The required output is...

ObjectID    FileName    CreatedDate     PrevRowCreatedDate
--------    --------    ----------- ---------------
1           ABC         10:23           NULL
2           ABC         10:25           10:23
3           DEF         10:26           NULL
4           ABC         10:30           10:25
5           DEF         10:31           10:26
6           DEF         10:35           10:31

So far I got this query, but it is taking a very long time than expected... Is there a better way to do it...

    Select  A.ObjectID, 
        A.FileName
        A.CreatedDate as CreatedDate, 
        B.PrevRowCreatedDate,
        datediff("SS", '1900-01-01 00:00:00', Coalesce((A.CreatedDate - B.PrevRowCreatedDate),0)) as secondsTaken
    from MyTable as A 
        Cross Apply (       
        (Select PrevRowCreatedDate = Max(CreatedDate) from MyTable as BB 
                        where   BB.FileName = A.FileName and 
                                BB.CreatedDate < A.CreatedDate
        )
        ) as B  

Please let me know incase you need more information

Thanks

+3  A: 
SELECT t1.FileName, t1.CreatedDate, t2.CreatedDate as PrevCreatedDate
FROM 
   (SELECT FileName, CreateDate,
          ROW_NUMBER() OVER(PARTITION BY FileName ORDER BY CreatedDate) AS OrderNo
   FROM MyTable) t1
LEFT JOIN
   (SELECT FileName, CreateDate,
     ROW_NUMBER() OVER(PARTITION BY FileName ORDER BY CreatedDate) AS OrderNo
     FROM MyTable) t2
ON (t1.FileName = t2.FileName AND t1.OrderNo = t2.OrderNo - 1)

Or may be better use 'WITH', because queries is identical:

WITH t(ObjectID, FileName, CreatedDate, OrderNo) AS
   (SELECT ObjectID, FileName, CreatedDate,
          ROW_NUMBER() OVER(PARTITION BY FileName ORDER BY CreatedDate) AS OrderNo
   FROM MyTable) 
SELECT t1.ObjectID, t1.FileName, t1.CreatedDate, t2.CreatedDate AS PrevCreatedDate,
        DATEDIFF("SS", '1900-01-01 00:00:00', 
           COALESCE((t1.CreatedDate - t2.CreatedDate),0)) AS secondsTaken
FROM t t1 LEFT JOIN t t2 
ON (t1.FileName = t2.FileName AND t1.OrderNo = t2.OrderNo + 1)
Michael Pakhantsov
Won't this only work if the files are entered sequentially?
JNK
Worth trying this and looking at the plans but I'm not sure this will be any more efficient than what the OP already has.
Martin Smith
@Martin, unfortunately mssql have only several windowed functions. in oracle same query will look more elegantly: SELECT FileName, CreateDate, LAG(CreatedDate, 1, NULL) OVER(PARTITION BY FileName ORDER BY CreatedDate) AS PrevCreatedDate FROM MyTable. But LAG does not exists in mssql
Michael Pakhantsov
@Michael - Plans here http://img686.imageshack.us/img686/3167/plansgn.jpg
Martin Smith
... though when I added a composite index on `FileName, CreatedDate` they both came out at the same cost. I'll do a bit more testing.
Martin Smith
@Martin Smith - I would imagine the amount of rows to process using the CROSS APPLY would quickly skyrocket. With given ROW_NUMBER solution and proper indexing, the amount of rows processed would roughly be n*2. For larger tables, I would think the ROW_NUMBER solution performs much better. OP Should try it (and post a query plan).
Lieven
@Martin, thanks. Do you have plan for original query?
Michael Pakhantsov
@Michael - That was the first one in that image. When I added the composite index on `FileName, CreatedDate` and a few thousands rows of test data though yours looks like it might be the slight winner though it's very close http://img715.imageshack.us/img715/3774/plansb.jpg
Martin Smith
@Martin, thanks!
Michael Pakhantsov
@Lieven - Sorry I'm coming to realise that for some reason the figure that SQL Server reports as the "query cost (relative to the batch)" for these `row_number` queries is orders of magnitude different from the actual cost. For some reason it seems unable to realise that the join on rownum=rownum+1 will yield one row and this puts its cost estimates way off the actual. When I look at actual reads, writes, and cpu your's and Michael's are about the same and both massively better than the OPs original query.
Martin Smith
@Michael - I just made a couple of minor edits that I noticed when I was testing the two queries (I think the `-` should have been a `+` and I added in the additional column). Hope that's OK.
Martin Smith
This works great... Thanks
The King
+2  A: 

I think Michael's answer should indeed prove more efficient. When evaluating efficiency though I just want to draw attention to an issue that caught me out and has probably in fact led me to incorrectly discounting row_number as an option in the past for this type of "previous row" query.

I set up a test table with 23,174 rows and ran the query in the question and Michael's. Looking at the "query cost (relative to the batch)" in the actual execution plan the original query was 0% and Michael's 100% cost and so appears to be massively inefficient.

Execution Plans

However the actual stats tell a completely different story

Cross Apply Approach

SQL Server Execution Times: CPU time = 0 ms, elapsed time = 0 ms.
Table 'MyTable'. Scan count 23175, logical reads 49335, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

ROW_NUMBER approach

SQL Server Execution Times: CPU time = 391 ms, elapsed time = 417 ms.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'MyTable'. Scan count 2, logical reads 148, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

In the ROW_NUMBER plan the Merge Join on rownumber=rownumber+1 has 23,174 rows going in to the top and bottom and actual rows out is 23,174 as well. For Some reason SQL Server estimates that the rows out will be 34,812,000 though and I think that is the reason that its reported cost figure is wildly inaccurate.

Test Script

BEGIN TRAN

CREATE TABLE MyTable(
    [ObjectID] [INT] IDENTITY(1,1) NOT NULL PRIMARY KEY CLUSTERED,
    [FILENAME] [VARCHAR](50) NULL,
    [CreatedDate] [DATETIME] NULL)  GO

INSERT INTO MyTable
SELECT ISNULL(TYPE,NEWID())     ,
DATEADD(DAY, CAST(RAND(CAST(NEWID() AS VARBINARY))*10000 AS INT),GETDATE()) 
FROM master.dbo.spt_values
GO 10

DELETE FROM MyTable
   WHERE EXISTS(SELECT * FROM MyTable m2 WHERE MyTable.CreatedDate = m2.CreatedDate AND MyTable.FILENAME = M2.FILENAME AND MyTable.ObjectID < M2.ObjectID)

CREATE UNIQUE NONCLUSTERED  INDEX [IX_MyTable] ON MyTable ([FILENAME] ASC,[CreatedDate] ASC)


SET STATISTICS IO ON
SET STATISTICS TIME ON

    SELECT  A.ObjectID, 
        A.FileName,
        A.CreatedDate AS CreatedDate, 
        B.PrevRowCreatedDate,
        DATEDIFF("SS", '1900-01-01 00:00:00', COALESCE((A.CreatedDate - B.PrevRowCreatedDate),0)) AS secondsTaken
    INTO #A
    FROM MyTable AS A 
        CROSS APPLY (       
        (SELECT PrevRowCreatedDate = MAX(CreatedDate) FROM MyTable AS BB 
                        WHERE   BB.FileName = A.FileName AND 
                                BB.CreatedDate < A.CreatedDate
        )
        ) AS B  
;

WITH t(ObjectID, FileName, CreatedDate, OrderNo) AS
   (SELECT ObjectID, FileName, CreatedDate,
          RANK() OVER(PARTITION BY FileName ORDER BY CreatedDate) AS OrderNo
   FROM MyTable) 
SELECT t1.ObjectID, t1.FileName, t1.CreatedDate, t2.CreatedDate AS PrevCreatedDate,
        DATEDIFF("SS", '1900-01-01 00:00:00', COALESCE((t1.CreatedDate - t2.CreatedDate),0)) AS secondsTaken
INTO #B
FROM t t1 LEFT JOIN t t2 
ON (t1.FileName = t2.FileName AND t1.OrderNo = t2.OrderNo + 1)


/*Test the 2 queries give the same result*/
SELECT * FROM #A
EXCEPT
SELECT * FROM #B

SELECT * FROM #B
EXCEPT
SELECT * FROM #A

ROLLBACK
Martin Smith
+1 @Martin, thanks a lot for whole analysis.
Michael Pakhantsov