views:

91

answers:

4

I'd like to select all records from a table (names) where lastname is not unique. Preferrably I would like to delete all records that are duplicates.

How would this be done? Assume that I don't want to rerun one query multiple times until it quits.

+2  A: 

To find which lastnames have duplicates:

  SELECT lastname, COUNT(lastname) AS rowcount 
    FROM table 
GROUP BY lastname 
  HAVING rowcount > 1

To delete one of the duplicates of all the last names. Run until it doesn't do anything. Not very graceful.

DELETE FROM table 
 WHERE id IN (SELECT id 
                FROM (SELECT * FROM table) AS t 
            GROUP BY lastname 
              HAVING COUNT(lastname) > 1)
scompt.com
Now write that as a delete please. :)
Josh K
I'm tempted to downvote simply because of the crappy second query. Surely there must be a simpler way then to re-run a query until it stops.
Josh K
A: 

dup http://stackoverflow.com/questions/18932/sql-how-can-i-remove-duplicate-rows

DELETE names
FROM names
LEFT OUTER JOIN (
   SELECT MIN(RowId) as RowId, lastname 
   FROM names
   GROUP BY lastname 
) as KeepRows ON
   names.lastname = KeepRows.lastname 
WHERE
   KeepRows.RowId IS NULL

assumption: you have an RowId column

Glennular
I have a `id` column.
Josh K
A: 
SELECT COUNT(*) as mycountvar FROM names GROUP BY lastname WHERE mycountvar > 1;

and then

DELETE FROM names WHERE lastname = '$mylastnamevar' LIMIT $mycountvar-1

but: why don't you just flag the fielt "lastname" als unique, so it isn't possible that duplicates can come in?

oezi
Because duplicates are already in the table. I'm trying to add `lastname` as a `UNIQUE INDEX`.
Josh K
+2  A: 

The fastest and easiest way to delete duplicate records is my issuing a very simple command.

ALTER IGNORE TABLE [TABLENAME] ADD UNIQUE INDEX UNIQUE_INDEX ([FIELDNAME])

This will lock the table, if this is an issue, try:

delete t1 from table1 t1, table2 t2
where table1.duplicate_field= table2.duplicate_field (add more if need ie. and table.duplicate_field2=table2.duplicate_field2)
and table1.unique_field > table2.unique_field
and breakup into ranges to run faster

Gary
Locking the table isn't an issue. The issue is there already duplicate rows.
Josh K
If locking is not an issue, then executing ALTER IGNORE TABLE [TABLENAME] ADD UNIQUE INDEX UNIQUE_INDEX ([FIELDNAME]) will rebuild the table and remove the duplicate records.
Gary
You can't apply a constraint if the data doesn't satisfy it - your suggestion would not work.
OMG Ponies
+1 and accepted. Locked the table temporarily and went to work. No duplicates and no more will be added.
Josh K
OMG, it does work. The IGNORE is the key part of what you are missing.
Gary