tags:

views:

129

answers:

4

How to delete duplicate records in sql?

+3  A: 

Here is how to do it in Oracle, using ROWID. Different flavours of RDBMS will have their own equivalent.

I start by creating some duplicate records ...

SQL> select t, count(*) from t23 group by t;

T       COUNT(*)
----- ----------
09:00          2
12:00          2
10:30          2
11:00          2
12:30          2
08:00          2
10:45          2
11:15          2

8 rows selected.

SQL>

... and now I zap them, using T to define "duplicate records"...

SQL> delete from t23
  2  where rowid > ( select min(rowid) from t23 x
  3                  where x.t = t23.t )
  4  /

8 rows deleted.

SQL> select t, count(*) from t23 group by t;

T       COUNT(*)
----- ----------
09:00          1
12:00          1
10:30          1
11:00          1
12:30          1
08:00          1
10:45          1
11:15          1

8 rows selected.

SQL>

Note that in the sub-query you have to include as many columns as necessary to specify what constitutes uniquenss. This could end up being the whole record, although one would hope not.

Incidentally, the most efficient way of doing this is not to have duplicate records in the first place. Which is why Nature gave us primary keys and unique constraints.

APC
+1 for mentioning creating some sort of row uniqueness
Matt
A: 

select col from table;

select distinct col from table;

Philluminati
+1  A: 

Since you don't have a PK on the table (assuming your rows are 100% duplicated), you won't have any problems with other tables referencing your table with a FOREIGN KEY.

The fastest and least complicated way of doing this is:

SELECT DISTINCT *
INTO #tmp
FROM YourTable;

TRUNCATE TABLE YourTable;

INSERT YourTable
SELECT * from #tmp;

Maybe consider adding some version of this statement to the end ;-)

ALTER YourTable ADD CONSTRAINT PK_YourTable PRIMARY KEY (whatever, keeps, this, from, happening, again);
Dave Markle
+2  A: 

In SQL Server 2005 and above:

WITH    q AS
        (
        SELECT  *, ROW_NUMBER() OVER (PARTITION BY dup_column ORDER BY dup_column) AS rn
        FROM    mytable
        )
DELETE
FROM    q
WHERE   rn > 1
Quassnoi
great blog article about this: http://explainextended.com/2009/03/14/deleting-duplicates/
KM
huh. never thought this would be allowed... but lo and behold, it works. +1
Dave Markle
@KM: Think I need to read it! :)
Quassnoi
@Quassnoi, I bet you've read it :)
KM