ansaurus

Question

Remove duplicate entries from database with conditions

Answer 1

A:

There are complaints that this is slow to execute, but that probably doesn't affect you. It will certainly be faster than anything else you might do:

select DISTINCT id, tracker, time, result
from table;

wallyk 2010-09-16 02:46:19

I need to record any and all changes to the results so this wouldnt work unfortunatly.

Duncan 2010-09-16 04:02:05

Answer 2

+1 A:

Use:

   DELETE a
     FROM YOUR_TABLE a
LEFT JOIN (SELECT MAX(t.id) AS latest_id
             FROM YOUR_TABLE t
         GROUP BY t.tracker, t.result) b ON b.latest_id = a.id
    WHERE b.latest_id IS NULL

Alternate using IN:

DELETE FROM YOUR_TABLE
 WHERE id NOT IN (SELECT x.latest_id
                   FROM (SELECT MAX(t.id) AS latest_id
                          FROM YOUR_TABLE t
                      GROUP BY t.tracker, t.result) x )

OMG Ponies 2010-09-16 02:56:08

Top one did what I was after, reduced number of rows by about 90% which should certainly make queries and backups quicker. Many thanks!

Duncan 2010-09-16 04:00:19

why do you need the wrapper subquery x in the second suggested query?

MattSmith 2010-09-16 05:50:16

@MattSmith: Without the wrapper, you'll get a MySQL #1093 error about referencing the table that is mutating.

OMG Ponies 2010-09-16 15:29:30

Answer 3

A:

I think you want a unique index on the table:

ALTER IGNORE TABLE table ADD UNIQUE INDEX (tracker, time, result)

http://dev.mysql.com/doc/refman/5.1/en/alter-table.html

You'll have to use INSERT IGNORE... when adding new rows as inserts that would duplicate an existing (tracker, time, result) key will cause an error.

Joshua Martell 2010-09-16 03:14:47

It's not explicitly stated, but the OP does read as though that's intended... but you can't apply the constraint until the data satisfies it. And the OP states they want to DELETE the duplicates...

OMG Ponies 2010-09-16 03:22:10

I had thoughts along these lines for future recording but a result captured a few minutes after another will still produce a new row even if the result is the same as the time is different surely?

Duncan 2010-09-16 04:05:41

ansaurus

tags:

views:

answers:

Remove duplicate entries from database with conditions

related questions