tags:

views:

54

answers:

5

I need to delete duplicated rows from database. Can i do it with simple sql query? If not, please, show me some quick algorythm to do it.

Example:

id| field_one | field_two |
1 | 0000000   | 11111111  |
2 | 2222222   | 33333333  |
3 | 2222222   | 33333333  |
4 | 4444444   | 55555555  |

I need to delete row with id 2 (or 3, no matter, they are equal, but not both). Thanks for any help

+1  A: 
delete from the_table where id in
   (select max(id) from the_table
      group by field_one, field_two
      having count(*) > 1)

As pointed out in the comments, this will not work if a row appears three times. You can run this (heavy) query repeatedly until it stops deleting stuff, or wait for a better answer...

Thilo
+1 - Was in the middle of posting exactly the same answer. Speed counts!
Barry
And if you have the same row three times?
Parkyprg
Thanks for super fast answer :) That should work.
Scorpil
@Parkyprg: good point.
Thilo
@Parkyprg Thank god i have not :) But for that sake of knowledge i would be glad if someone show the solution with N duplicates.
Scorpil
A: 

Maybe this will help you http://support.microsoft.com/kb/139444

Jeff Norman
+2  A: 

First select all the distinct rows and then delete the other ones:

DELETE FROM MyTable 
WHERE id NOT IN
      (
        SELECT MAX(id) FROM MyTable
        GROUP BY field_one, field_two
      )
Parkyprg
+1. That would work with rows duplicated more than once. It could be quite slow though, if most rows are not duplicated. I suppose it is good to have both queries in your arsenal and chose according to the situation at hand.
Thilo
Another useful approach when there are many duplicates could be to copy the "good" rows into a working/staging table, and then truncate the old one. That avoids fragmentation.
Thilo
+1  A: 

Thilo's answer is a useful one, it just makes what you want. Anyway if you have many lines it could take up much time as the algorithm has a square complexity. If I were the person who asked, I would choose Thilo's answer as best answer, anyway I just want to give you another option: if you have many lines then another possibility is:

create a new table, set up a UNIQUE INDEX for the column combination: (field_one, field_two) and copy the content of the first table into the new one. Then you delete the old one and rename the new one to the old table name.

That's all.

Ervin
+2  A: 
set rowcount 1 
delete userTbl1 from userTbl1 a1 where (select count(UName) from userTbl1 a2 where a2.UName =a1.UName)>1
while @@rowcount > 0 
delete userTbl1 from userTbl1 a1 where (select count(UName) from userTbl1 a2 where a2.UName =a1.UName)>1
set rowcount 0
AsifQadri