views:

76

answers:

1

How can I delete duplicate rows from a MySQL table when a foreign key relationship has already been setup up on those rows.
Can the duplicates be merged somehow and then foreign key updated with the new value?

+1  A: 

If the foreign key is ON DELETE CASCADE, then deleting the duplicate rows will also delete the dependent rows, e.g., if you have a table customers and a table orders, and a foreign key like ALTER TABLE orders ADD FOREIGN KEY customer_id REFERENCES customers (id) ON DELETE CASCADE, then deleting a customer will also delete that customer's orders. Similarly, if the foreign key has ON DELETE SET NULL, then the orders will not be deleted, but their customer_id values will be set to NULL.

If neither of these is the desired behaviour, craft a query that resolves the foreign key conflicts by altering the foreign key columns so that they reference the row you want to keep (i.e., update all orders to reference non-duplicate customers), then delete the offending rows.

Yet another alternative is to disable foreign key checks temporarily, but this will leave you with an inconsistent database, so I wouldn't recommend this.

tdammers
Could I use the ON DELETE SET NULL to set all the foreign key columns that were linked to duplicates to NULL, then update them in a script to the id directly above it?
chustar
"directly above it" is not exactly a valid SQL construct. A record in an SQL database has no knowledge of what came before or comes after it in the data table. "before" and "after" are imposed by the operator (think of it as the difference between `sort asc` and `sort desc`).
Marc B
You could use a temporary table to store the relation between rows you are going to delete and the corresponding row you want to keep, e.g. `CREATE TABLE customer_duplicates (duplicate_id, original_id);` with FKs to the duplicate and original customers. Decide which customers you want to keep and which are duplicates, fill the duplicates table accordingly. Then update the orders table so that every order referencing a duplicate customer now references the original customer. Finally, remove the duplicate customers and drop the duplicates table.
tdammers