tags:

views:

77

answers:

5

Guys, I have two basic tables (id, first_name).

I want to run table 1 against table 2, delete duplicates, and spit out the cleansed version of table 1.

What's the easiest way to do this with PHP/MySQL?

Thanks.

A: 

Step 1 :

Create table t3 as (
   select t1.* 
   from table1 t1 left outer join table2 t2 on t1.id = t2.id and t1.first_name = t2.first_name
   where t2.id is null
)

Step 2 :

Delete table1, table2

Step 3 :

Rename table3 in table1

I don't know if it is the easiest way, but it is the fastest if you have big tables.

Scorpi0
+2  A: 

This will delete all records from t1 that also exist in t2, leaving you with a stripped-down t1, but possibly with records in t2 that don't exist in t1:

DELETE FROM table1 t1
WHERE t1.id IN
    (SELECT id from table2 t2
        WHERE t2.id = t1.id AND t2.first_name = t1.first_name
    )

*You may want to consider using EXISTS instead of IN, as per Brian Hooper's suggestion.

This will combine the two tables into a third table (t3), removing duplicates along the way:

SELECT * INTO t3 
FROM t1 UNION SELECT * FROM t2

That will work for SQL Server (definitely) and MySQL (I think) but MySQL supports CREATE TABLE table_name AS select-statement, so you could use:

CREATE TABLE t3 AS
(SELECT * FROM t1 UNION DISTINCT SELECT * FROM t2)

*The DISTINCT keyword is optional - it's the default behaviour

gkrogers
+1  A: 
DELETE FROM table1 t1
    WHERE EXISTS (SELECT *
                      FROM table2 t2
                      WHERE t2.id         = t1.id AND
                            t2.first_name = t1.first_name);

I'd prefer the exists to IN which I have had trouble with in the past, with it taking a long time.

Brian Hooper
A: 

Why not use

 DELETE FROM table1 WHERE first_name IN (SELECT first_name from table2)

? Correct me if I missed something.

SpawnCxy
There's nothing to indicate that first_name is unique. E.g. you could have a table1 full of John and Jims, but with different IDs. In which case you'd want to keep them unless they appeared in table2 with the same ID.
gkrogers
@gkrogers,oh that's right.Thanks for correcting.
SpawnCxy
A: 

This is without using subquery (not tested):

DELETE t1 
FROM   table1 t1
JOIN   table2 t2 
    ON  t2.id = t1.id 
    AND t2.first_name = t1.first_name
Vili