views:

83

answers:

4

I'm sure this has been asked before, but I was having a hard time finding it.

I have multiple groups of duplicates in one table (3 records for one, 2 for another, etc) - multiple rows where more than 1 exists.

Below is what I came up with to delete them, but I have to run the script for however many duplicates there are:

set rowcount 1
delete from Table
where code in (
  select code from Table 
  group by code
  having (count(code) > 1)
)
set rowcount 0

This works well to a degree. I need to run this for every group of duplicates, and then it only deletes 1 (which is all I need right now).

I appreciate your help/comments!

+4  A: 

If you have a key column on the table, then you can use this to uniquely identify the "distinct" rows in your table.

Just use a sub query to identify a list of ID's for unique rows and then delete everything outside of this set. Something along the lines of.....

create table #TempTable
(
    ID int identity(1,1) not null primary key,
    SomeData varchar(100) not null
)

insert into #TempTable(SomeData) values('someData1')
insert into #TempTable(SomeData) values('someData1')
insert into #TempTable(SomeData) values('someData2')
insert into #TempTable(SomeData) values('someData2')
insert into #TempTable(SomeData) values('someData2')
insert into #TempTable(SomeData) values('someData3')
insert into #TempTable(SomeData) values('someData4')

select * from #TempTable

--Records to be deleted
SELECT ID
FROM #TempTable
WHERE ID NOT IN
(
    select MAX(ID)
    from #TempTable
    group by SomeData
)

--Delete them
DELETE
FROM #TempTable
WHERE ID NOT IN
(
    select MAX(ID)
    from #TempTable
    group by SomeData
)

--Final Result Set
select * from #TempTable

drop table #TempTable;

Alternatively you could use a CTE for example:

WITH UniqueRecords AS
(
    select MAX(ID) AS ID
    from #TempTable
    group by SomeData
)
DELETE A
FROM #TempTable A
    LEFT outer join UniqueRecords B on
        A.ID = B.ID
WHERE B.ID IS NULL
John Sansom
Great detail - thank you very much!
Dan
+1 for the CTE solution - most elegant in my opinion!
marc_s
@Dan:You're most welcome. @marc_s: Thank you sir!
John Sansom
Just noticed that the OP is on SQL Server 2000 anyway but it's possible to do this more efficiently with a CTE as here http://stackoverflow.com/questions/18932/sql-how-can-i-remove-duplicate-rows/3822833#3822833
Martin Smith
+1  A: 
SET ROWCOUNT 1    
DELETE Table    
FROM Table a    
WHERE (SELECT COUNT(*) FROM Table b WHERE b.Code = a.Code ) > 1    
WHILE @@rowcount > 0    
  DELETE Table    
  FROM Table a    
  WHERE (SELECT COUNT(*) FROM Table b WHERE b.Code = a.Code ) > 1    
SET ROWCOUNT 0

this will delete all duplicate rows, But you can add attributes if you want to compare according to them .

Azhar
"DELETE Table FROM Table"??? Have you run this script?
vgv8
ya... its working fine
Azhar
A: 

It is frequently more efficient to copy unique rows into temporary table,
drop source table, rename back temporary table.

I reused the definition and data of #TempTable, called here as SrcTable instead, since it is impossible to rename temporary table into a regular one)

create table SrcTable
(
    ID int identity(1,1) not null primary key,
    SomeData varchar(100) not null
)

insert into SrcTable(SomeData) values('someData1')
insert into SrcTable(SomeData) values('someData1')
insert into SrcTable(SomeData) values('someData2')
insert into SrcTable(SomeData) values('someData2')
insert into SrcTable(SomeData) values('someData2')
insert into SrcTable(SomeData) values('someData3')
insert into SrcTable(SomeData) values('someData4')

by John Sansom in previous answer

-- cloning "unique" part
SELECT * INTO TempTable 
FROM SrcTable --original table
WHERE id IN  
(SELECT MAX(id) AS ID
FROM SrcTable
GROUP BY SomeData);
GO;

DROP TABLE SrcTable
GO;

sys.sp_rename 'TempTable', 'SrcTable'
vgv8
A: 

You can alternatively use ROW_NUMBER() function to filter out duplicates

;WITH [CTE_DUPLICATES] AS 
(
SELECT RN = ROW_NUMBER() OVER (PARTITION BY SomeData ORDER BY SomeData)
FROM #TempTable
) 
DELETE FROM [CTE_DUPLICATES] WHERE RN > 1
anivas
The OP for this question is on SQL Server 2000.
Martin Smith