tags:

views:

974

answers:

3

Take the tsql query below:

DECLARE @table TABLE(data VARCHAR(20))
INSERT INTO @table VALUES ('not duplicate row')
INSERT INTO @table VALUES ('duplicate row')    
INSERT INTO @table VALUES ('duplicate row')    
INSERT INTO @table VALUES ('second duplicate row')    
INSERT INTO @table VALUES ('second duplicate row')

SELECT   data    
INTO     #duplicates    
FROM     @table    
GROUP BY data    
HAVING   COUNT(*) > 1    

-- delete all rows that are duplicated   
DELETE   FROM @table    
FROM     @table o INNER JOIN #duplicates d
         ON d.data = o.data         

-- insert one row for every duplicate set    
INSERT   INTO @table(data)    
         SELECT   data    
         FROM     #duplicates

I understand what it is doing, but the last part of logic (after --insert one row for every duplicate set), doesn't make sense. Where we have the set of code for --delete all rows that are duplicated, that gets rid of the duplicates so what's the part of the last section?

This query was found here

Thanks

+4  A: 

Where we have the set of code for --delete all rows that are duplicated, that gets rid of the duplicates so what's the part of the last section?

First, it deletes all rows that ever had duplicates. That is, all rows, and original also. In the case above, only one row ('not duplicate row') will remain in the table after DELETE. All four other rows will be deleted.

Then is populates the table with the deleted rows again, but now the duplicates are removed.

This is not the best way to delete duplicates.

The best way is:

WITH q AS (
          SELECT data, ROW_NUMBER() OVER (PARTITION BY data ORDER BY data) AS rn
          FROM @table
          )
DELETE
FROM      q
WHERE     rn > 1
Quassnoi
Makes perfect sense. As a variation of the above query, how can I get every duplicate record? E.g. duplicate record is listed twice; how can I get both values?
You mean, select all instances of rows that have duplicate values?
Quassnoi
A: 

The delete call deletes all matching records.

Because all duplicated rows have been deleted the last piece reinserts a single row.

blowdart
A: 
Create table Test (Test1 int not null , Test2 varchar(10) null )


Insert Into Test 

Select 12, 'abc'

UNion All 

Select 13 , 'def'



Insert Into Test 

Select 12, 'abc'

UNion All 

Select 13 , 'def'



Select * From Test 

WITH t1 AS 

(SELECT ROW_NUMBER ( ) OVER ( PARTITION BY test1, test2 ORDER BY test1) 

AS RNUM FROM Test )

DELETE FROM t1 WHERE RNUM > 1
Gr8.. it worked well