views:

254

answers:

5

How to remove duplicated in this setup?

id A B

1 apple 2

2 orange 1

3 apple 2

4 apple 1

In here I want to remove (apple,2) which occurs twice. The id numbers are unique. I would use DISTINCT keyword if it were not. Can I some how make a key out of columns A and B and then use the DISTINCT keyword on that to get what I need ? Many thanks for your replies.

+3  A: 
delete from myTable 
where id not in
(select min(id)
from myTable
group by A, B)

i.e. the select in brackets returns the first id for each grouping of A and B; deleting all ids that are not in this set will remove all occurences of an A-plus-B combination that are "subsequent" to its first occurrence.

EDIT: this syntax seems to be problematic: see bug report:

http://bugs.mysql.com/bug.php?id=5037

A possible workaround is to do this:

delete from myTable 
where id not in
(
      select minid from 
      (select min(id) as minid from myTable group by A, B) as newtable
)
davek
Nice. Very nice.
Larry Lustig
How does this perform relative to my answer below? I'm not enough of a DB guru to analyze it...
Benjamin Cox
Nice.. this will remove row where id=3 and not where id in 1,3
ps
@Benjamin: I'm not sure: my guess is that it will depend on the data distribution. But this version should be portable to other databases and for me - at least! - it's more readable.
davek
Definitely more readable - glad to hear it's more portable as well. I'll be testing this out next week on my own data set. Thanks, Dave!
Benjamin Cox
I get this error when using this construct. I can always use a temp table ofcourse. ERROR 1093 (HY000): You can't specify target table 'myTable' for update in FROM clause.
Senthil
@Senthil: you're right: DELETE has a problem with inline subqueries referring to the target table. See my edited comment.
davek
+1  A: 
DELETE FROM fruit_table FT1
WHERE EXISTS
(
    SELECT * FROM fruit_table FT2 
    WHERE FT2.fruit_name_column = FT1.fruit_name_column
    AND   FT2.fruit_integer_column = FT1.fruit_integer_column
    AND   FT2.id <> FT1.id
)

This assumes you don't care which of the duplicate records is removed.

Larry Lustig
A: 

You could use a temporary table with the data you want:

insert into temp_table
select min(id), A, B
 group by A, B
Pablo Santa Cruz
A: 

I'm not exactly sure what you're asking here. If you don't want duplicates of the A and B columns, then do just what you mentioned SELECT DISTINCT A, B FROM XXX. Maybe you could post an example of the type of result you would like to see.

BryanD
I guess "group by" is what I was missing, the other posts have clarified this.
Senthil
A: 
DELETE
FROM mytable
USING mytable, mytable AS vtable
WHERE vtable.id > mytable.id
AND mytable.A = vtable.A
AND mytable.A = vtable.A
Benjamin Cox