tags:

views:

85

answers:

2

I would like to delete records which are considered duplicates based on them having the same value in a certain column and keep one which is considered the newest based on InsertedDate in my example below. I would like a solution which doesn't use a cursor but is set based. Goal: delete all duplicates and keep the newest.

The ddl below creates some duplicates. The records which need to be deleted are: John1 & John2 because they have the same ID as John3 and John3 is the newest record.

Also record John5 needs to be deleted because there's another record with ID = 3 and is newer (John6).

Create table dbo.TestTable (ID int, InsertedDate DateTime, Name varchar(50))

Insert into dbo.TestTable Select 1, '07/01/2009', 'John1'
Insert into dbo.TestTable Select 1, '07/02/2009', 'John2'
Insert into dbo.TestTable Select 1, '07/03/2009', 'John3'
Insert into dbo.TestTable Select 2, '07/03/2009', 'John4'
Insert into dbo.TestTable Select 3, '07/05/2009', 'John5'
Insert into dbo.TestTable Select 3, '07/06/2009', 'John6'
+1  A: 

This works:

delete t 
from TestTable t
left join 
(
    select id, InsertedDate = max(InsertedDate) from TestTable
    group by id
) as sub on sub.id = t.id and sub.InsertedDate = t.InsertedDate
where sub.id is null

If you have to deal with ties it gets a tiny bit trickier.

Sam Saffron
Thanks. I won't have ties in my case. I chose your answer because I am not yet familiar with the newer features of SQL Server.
Tony_Henrich
+3  A: 

Just as an academic exercise:

with cte as (
   select *, row_number() over (partition by ID order by InsertedDate desc) as rn
   from TestTable)
delete from cte
where rn <> 1;

Most of the time the solution proposed by Sam performs much better.

Remus Rusanu
I prefer this solution as it handles ties, which can be controlled to some degree with the ORDER BY if there's anything else that might hint as to which to keep in the case of a tie.
Timothy Walters