views:

131

answers:

3

I have a table with the following fields:

id (Unique)
url (Unique)
title
company
site_id

Now, I need to remove rows having same title, company and site_id. One way to do it will be using the following SQL along with a script (PHP):

SELECT title, site_id, location, id, count( * ) 
FROM jobs
GROUP BY site_id, company, title, location
HAVING count( * ) >1

After running this query, I can remove duplicates using a server side script. But, I want to know if this can be done only using SQL query.

A: 

I have this query snipet for SQLServer but I think It can be used in others DBMS with little changes:

DELETE
FROM Table
WHERE Table.idTable IN  (  
    SELECT MAX(idTable)
    FROM idTable
    GROUP BY field1, field2, field3
    HAVING COUNT(*) > 1)

I forgot to tell you that this query doesn't remove the row with the lowest id of the duplicated rows. If this works for you try this query:

DELETE
FROM jobs
WHERE jobs.id IN  (  
    SELECT MAX(id)
    FROM jobs
    GROUP BY site_id, company, title, location
    HAVING COUNT(*) > 1)
eiefai
That won't work if there's more than two duplicates of a group.
OMG Ponies
Unfortunately, MySQL does not allow you to select from the table you are deleting from `ERROR 1093: You can't specify target table 'Table' for update in FROM clause`
Andomar
OMG Ponies, I know that, this is just a snipet that I use sometimes and seemed to fit the question, thats why I said that It needed to be changed. Thanks for the comment.Andomar, I didn't know that. Thanks to you too.
eiefai
+2  A: 

A really easy way to do this is to add a UNIQUE index on the 3 columns. When you write the ALTER statement, include the IGNORE keyword. Like so:

ALTER IGNORE TABLE jobs ADD UNIQUE INDEX idx_name (site_id, title, company );

This will drop all the duplicate rows. As an added benefit, future INSERTs that are duplicates will error out. As always, you may want to take a backup before running something like this...

Chris Henry
[Interesting](http://dev.mysql.com/doc/refman/5.1/en/alter-table.html), but the assumptions the IGNORE clause makes for removing those duplicates is a concern that might not match needs. Incorrect values being truncated to the closest acceptable match sound good to you?
OMG Ponies
In this particular case, that's definitely true. The collation of the title and company columns definitely matter. What, exactly, does incorrect values mean? I smell another question...
Chris Henry
this did the job, thanks a lot!
Chetan
+2  A: 

MySQL has restrictions about referring to the table you are deleting from. You can work around that with a temporary table, like:

create temporary table tmpTable (id int);

insert  tmpTable
        (id)
select  id
from    YourTable yt
where   exists
        (
        select  *
        from    YourTabe yt2
        where   yt2.title = yt.title
                and yt2.company = yt.company
                and yt2.site_id = yt.site_id
                and yt2.id > yt.id
        );

delete  
from    YourTable
where   ID in (select id from tmpTable);
Andomar
+1: Your MySQL-fu is better than mine
OMG Ponies