I have a script that did double inserts into the database with the same data. Is there a good way to do this (without scanning through, inserting every record into an array, and then deleting duplicate array entries)?
DELETE
FROM t
WHERE ID IN (
SELECT MAX(ID)
FROM t
GROUP BY {Your Group Criteria Here}
HAVING COUNT(*) > 1
)
MySQL supports multi-table DELETE which is really cool and can help here. You can do a self-join on the equality of all columns except the id
, and then delete the matching row with the greater id
.
DELETE t2
FROM mytable t1 JOIN mytable t2
USING (column1, column2, column3) -- this is an equi-join
WHERE t1.id < t2.id;
Or the old simple way, I'd be surprised if it's not fastest. Especially faster than matching a GROUP BY aggregate function.
DELETE FROM mytable m1
WHERE EXISTS
(
SELECT 1 FROM mytable
WHERE fields = m1.fields
AND id < m1.id
)
If you don't have anything referencing into the table by key right now, I'd mysqldump --complete-insert it, strip the primary keys, change the table definition to enforce some unique key or another that would catch your duplications, change the INSERTs to REPLACEs, and load the data back in. Gets you a nice clean table without holes in the PK sequence or deleted rows.