ansaurus

Question

SQL Server Comparing Subsequent Rows for Duplicates

Answer 1

+1 A:

Something like

select TaskID,StatusCode,Min(TimeStamp)
from table
group by TaskID,StatusCode
order by 1,2

Note that is statuscode can duplicate, you will need an additional field, but hopefully this can point you in the right direction...

Sparky 2009-12-07 17:41:42

thanks for your reply Sparky. That is really close but the issue with this is in the table i only want to ignore the remaining row for a given status for a task if its consecutive rows following have the same statuscode. Eg. A task can go to statuscode 2 then to 3 then back to 2 and these are all valid rows i want to show as the statuscode 2 didn't happen consecutively. I think your query will give me the min timestamp for the given status on that task. This is why my query is so strange due to the consecutive rows issue.

Iffy 2009-12-07 17:49:04

Answer 2

A:

Something like the following should get you in the right direction....

CREATE TABLE #T
(
    TaskId INT
    ,StatusCode INT
    ,StatusTimeStamp DATETIME
)

INSERT INTO #T
SELECT 1, 1, '2009-12-01 14:20'
UNION SELECT 1, 2, '2009-12-01 16:20'
UNION SELECT 1, 2, '2009-12-02 09:15'
UNION SELECT 1, 2, '2009-12-02 12:15'
UNION SELECT 1, 3, '2009-12-02 18:15'

;WITH CTE AS
(
SELECT TaskId
     ,StatusCode
     ,StatusTimeStamp
     ,ROW_NUMBER() OVER (PARTITION BY TaskId, StatusCode ORDER BY TaskId, StatusTimeStamp DESC)  AS RNUM
FROM  #T
)
SELECT TaskId
     ,StatusCode
     ,StatusTimeStamp
FROM CTE
WHERE RNUM = 1

DROP TABLE #T

old account 2009-12-07 17:49:28

Thanks Jay, this is a very smart query but again this has the same issue as the query above by Sparky where it groups the statuscode and taskid together and only returns one row for each status code. An example would be if i was to put the rowUNION SELECT 1, 2, '2009-12-02 19:15';into your query where this status code has already been inserted but this one is not consecutive to the others so should show. What the query above is group this row in with the other ones.

Iffy 2009-12-07 18:14:33

I was thinking maybe of writing a cursor where i keep track of the previous statuscode and if its the same as the current i don't enter into my temp table else if its different then i do the entry into my temp table. At the end of this i will have a temp table with the data that i want be it with a slower query due to the cursor. This all depends though on how long the cursor takes to do this operation for a coupld hundred rows if i go that way?

Iffy 2009-12-07 18:15:34

Answer 3

A:

This are a couple ways of getting what you want:

SELECT
     T1.task_id,
     T1.status_code,
     T1.status_timestamp
FROM
     My_Table T1
LEFT OUTER JOIN My_Table T2 ON
     T2.task_id = T1.task_id AND
     T2.status_timestamp < T1.status_timestamp
LEFT OUTER JOIN My_Table T3 ON
     T3.task_id = T1.task_id AND
     T3.status_timestamp < T1.status_timestamp AND
     T3.status_timestamp > T2.status_timestamp
WHERE
     T3.task_id IS NULL AND
     (T2.status_code IS NULL OR T2.status_code <> T1.status_code)
ORDER BY
     T1.status_timestamp

or

SELECT
     T1.task_id,
     T1.status_code,
     T1.status_timestamp
FROM
     My_Table T1
LEFT OUTER JOIN My_Table T2 ON
     T2.task_id = T1.task_id AND
     T2.status_timestamp = (
          SELECT
               MAX(status_timestamp)
          FROM
               My_Table T3
          WHERE
               T3.task_id = T1.task_id AND
               T3.status_timestamp < T1.status_timestamp)
WHERE
     (T2.status_code IS NULL OR T2.status_code <> T1.status_code)
ORDER BY
     T1.status_timestamp

Both methods rely on there being no exact matches of the status_timestamp values (two rows can't have the same exact status_timestamp for a given task_id.)

Tom H. 2009-12-07 19:27:48

Thanks Tom, i used the second query with some mods to work for my purpose but it did the trick and was really quick too.Thanks a lot for you help.

Iffy 2009-12-09 17:49:54

you're welcome!

Tom H. 2009-12-09 18:10:50

ansaurus

tags:

views:

answers:

SQL Server Comparing Subsequent Rows for Duplicates

related questions