ansaurus

Question

Removing duplicate SQL records to permit a unique key

Answer 1

+2 A:

In reply to your comment, here's a query that works in MySQL:

delete YourTable
from YourTable
inner join YourTable yt2
on YourTable.product_id = yt2.product_id
and YourTable.id < yt2.id

This would only remove duplicate rows. The inner join will filter out the latest row for each product, even if no other rows for the same product exist.

P.S. If you try to alias the table after FROM, MySQL requires you to specify the name of the database, like:

delete <DatabaseName>.yt
from YourTable yt
inner join YourTable yt2
on yt.product_id = yt2.product_id
and yt.id < yt2.id;

Andomar 2010-05-18 22:33:28

Answer 2

+1 A:

I might do the following in sql-server to eliminate the duplicates:

DELETE FROM Sales
FROM Sales
    INNER JOIN Sales b ON Sales.product_id = b.product_id AND Sales.id < b.id

It looks like the analogous delete statement for mysql might be:

DELETE FROM Sales 
USING Sales
    INNER JOIN Sales b ON Sales.product_id = b.product_id AND Sales.id < b.id

Michael Petito 2010-05-18 22:36:54

The second one works (at least on my MySQL installation ;))

Andomar 2010-05-18 22:52:46

Yah, I was trying it out when you posted your revised answer following my comment, Andomar. Thanks to you both.

j pimmel 2010-05-18 22:54:54

Answer 3

A:

This type of problem is easier to solve with CTEs and Ranking functions, however, you should be able to do something like the following to solve your problem:

Delete Sales
Where Exists(
            Select 1
            From Sales As S2
            Where S2.product_id = Sales.product_id
                And S2.id > Sales.Id
            Having Count(*) > 0
            )

Thomas 2010-05-18 23:03:24

Answer 4

+1 A:

Perhaps use ALTER IGNORE TABLE ... ADD UNIQUE KEY. For example:

describe sales;
+------------+---------+------+-----+---------+----------------+
| Field      | Type    | Null | Key | Default | Extra          |
+------------+---------+------+-----+---------+----------------+
| id         | int(11) | NO   | PRI | NULL    | auto_increment | 
| product_id | int(11) | NO   |     | NULL    |                | 
+------------+---------+------+-----+---------+----------------+

select * from sales;
+----+------------+
| id | product_id |
+----+------------+
|  1 |          1 | 
|  2 |          1 | 
|  3 |          2 | 
|  4 |          3 | 
|  5 |          3 | 
|  6 |          2 | 
+----+------------+

ALTER IGNORE TABLE sales ADD UNIQUE KEY idx1(product_id), ORDER BY id DESC; 
Query OK, 6 rows affected (0.03 sec)
Records: 6  Duplicates: 3  Warnings: 0


select * from sales;
+----+------------+
| id | product_id |
+----+------------+
|  6 |          2 | 
|  5 |          3 | 
|  2 |          1 | 
+----+------------+

See this pythian post for more information.

Note that the ids end up in reverse order. I don't think this matters, since order of the ids should not matter in a database (as far as I know!). If this displeases you however, the post linked to above shows a way to solve this problem too. However, it involves creating a temporary table which requires more hard drive space than the in-place method I posted above.

unutbu 2010-05-18 23:19:01

ansaurus

tags:

views:

answers:

Removing duplicate SQL records to permit a unique key

related questions