tags:

views:

54

answers:

2

Hi, i have mySQL database with some duplicate entries. They have the same field - phone. But they also had fields which differs. At example i have two entries with same phone, but first entry has rating filed = default_value and second entry has rating field = 5. So i must merge this entries and only then delete duplicates...

More common example:

    entry1.phone==123
entry1.phone==etry2.phone
    entry1.rating!=entry2.phone
    entry1.rating==default_value(0)
    entry2.rating==5
    merge
    entry1.phone==123
    entry1.rating==5
    entry2 is deleted
+1  A: 

I don't think you can do this in SQL efficiently. One slow way to do it is something like:

CREATE TEMPORARY TABLE tmp_table (...);
INSERT INTO tmp_table SELECT phone, max(rating) FROM table GROUP BY phone;
TRUNCATE table;
INSERT INTO table SELECT * FROM tmp_table;

A better way would be a stored procedure or an external script. Select all rows from the table ordered by phone and do the grouping/merging/deleting manually (iterate over the results, compare to the phone value from the previous row, if it's different you have a new group, etc.). Writing stored procedures in MySQL is painful though, so I'm not going to write the code for you. :)

Lukáš Lalinský
+1  A: 

It sounds like you don't really need to merge any records if you are just trying to update the first record with the non-default rating. I think you can just delete any records with the default rating.

Select a.*
from tbl a
inner join tbl b
on a.Phone = b.Phone
    and a.Rating < b.Rating


Delete a
from tbl a
inner join tbl b
on a.Phone = b.Phone
    and a.Rating < b.Rating

If you truly have to update the first record and delete the second record, you can do something similar if you have an autoincrement ID. The next example is what I would do to update the first record if an ID exists. This is only reliable if you only have phone numbers duplicated one time.

Update a
Set a.Rating = b.Rating
from tbl a
inner join tbl b
on a.Phone = b.Phone
    and a.Rating < b.Rating
    and a.ID < b.ID


Delete a
from tbl a
inner join tbl b
on a.Phone = b.Phone
    and a.Rating = b.Rating
    and b.ID > a.ID

Hope this helps.

-Ranthalion

Ranthalion