views:

537

answers:

3

Hello

I have a table that has a lot of duplicates in the Name column. I'd like to only keep one row for each.

The following lists the duplicates, but I don't know how to delete the duplicates and just keep one:

SELECT name FROM members GROUP BY name HAVING COUNT(*) > 1;

Thank you.

A: 

It would probably be easier to select the unique ones into a new table, drop the old table, then rename the temp table to replace it.

#create a table with same schema as members
CREATE TABLE tmp (...);

#insert the unique records
INSERT INTO tmp SELECT * FROM members GROUP BY name;

#swap it in
RENAME TABLE members TO members_old, tmp TO members;

#drop the old one
DROP TABLE members_old;
Paul Dixon
Thanks Paul. For those interested...CREATE TEMP TABLE tmp_members (name VARCHAR);INSERT INTO tmp_members SELECT name FROM members GROUP BY name;SELECT COUNT(name) FROM tmp_members;DELETE FROM members;VACUUM members;SELECT COUNT(name) FROM members;INSERT INTO members (name) SELECT * FROM tmp_members;SELECT COUNT(name) FROM members;SELECT DISTINCT COUNT(name) FROM members;SELECT name FROM members LIMIT 10;DROP TABLE tmp_members;
OverTheRainbow
Sorry, I missed that you were using SQLite!
Paul Dixon
+2  A: 

See the following question: Deleting duplicate rows from a table.

The adapted accepted answer from there (which is my answer, so no "theft" here...):

You can do it in a simple way assuming you have a unique ID field: you can delete all records that are the same except for the ID, but don't have "the minimum ID" for their name.

Example query:

DELETE FROM members
WHERE ID NOT IN
(
    SELECT MIN(ID)
    FROM members
    GROUP BY name
)

In case you don't have a unique index, my recommendation is to simply add an auto-incremental unique index. Mainly because it's good design, but also because it will allow you to run the query above.

Roee Adler
Here's how I understand the above: For each name, it groups them (only one if unique; several into one if duplicates), selects the smallest ID from the set, and then deletes any row whose ID doesn't exist in the table.Brilliant :) Thanks much Rax.
OverTheRainbow
You got it exactly :)
Roee Adler
A: 

We have a huge database where deleting duplicates is part of the regular maintenance process. We use DISTINCT to select the unique records then write them into a TEMPORARY TABLE. After TRUNCATE we write back the TEMPORARY data into the TABLE.

That is one way of doing it and works as a STORED PROCEDURE.

G Berdal
I have to admit Rax Olgud's answer is much-much more sophisticated and probably runs 100 times quicker! :) - I'm thinking about adopting the solution... Deserves +1!
G Berdal