tags:

views:

235

answers:

12

I have two tables, for example:

Table A                                  Table B
=======                                  =======

Name         | Color                     Name         | Color
----------------------                   ----------------------
Mickey Mouse | red                       Mickey Mouse | red
Donald Duck  | green                     Donald Duck  | blue
Donald Duck  | blue                      Minnie       | red
Goofy        | black
Minnie       | red

Table A is my source table and B is the destination table. Now I need a query which finds all the different (additional) rows in table A so table B can be updated with those rows. So I need a query which finds me the following rows from table A:

Name         | Color  
----------------------
Donald Duck  | green  
Goofy        | black

What is a good approach for such a query? It should be as efficient as possible (avoid too many joins). Thanks for any help!

A: 
Select A.Name,A.Color From TableA as A , TableB as B where A.Name != B.Name and A.Color != B.Color
john misoskian
"Select A.Name,A.Color From TableA as A , TableB as B where A.Name != B.Name and A.Color != B.Color" this is old sql join syntax, and should be avoided
KM
This is just plain wrong. You're comparing individual records against individual records. The result will duplicate every record in table A as many times as there are records in table B that don't match. -1
Welbog
+8  A: 

I would use a NOT EXISTS structure.

SELECT Name, Color
FROM TableA
WHERE NOT EXISTS (
SELECT 1 FROM TableB
WHERE TableA.Name = TableB.Name 
AND TableA.Color = TableB.Color)
dpmattingly
NOT EXISTS uses a correlated subquery, so on larger datasets it is significantly faster than "NOT IN".
Guy Starbuck
+1 for readability over Justin's answer; however, Justin's is likely faster and IMHO better.
sheepsimulator
Thanks a lot! This is just what I needed!
+1  A: 

A NOT EXISTS subquery should resolve to an outer join:

SELECT Name, Color
FROM TableA
WHERE NOT EXISTS (
  SELECT 1
  FROM TableB
  WHERE TableA.Color = TableB.Color
  AND   TableA.Name  = TableB.Name
)

Or you could just use an outer join directly:

SELECT TableA.Name, TableA.Color
FROM TableA
LEFT OUTER JOIN TableB
  ON  TableA.Name  = TableB.Name
  AND TableA.Color = TableB.Color
WHERE TableB.Name IS NULL

They should be equally performant; it's a question of which you feel is more intuitive.

Welbog
+3  A: 
Select A.Name, A.Color
From A left join B on A.Name = B.Name and A.Color = B.Color
Where B.Name is null
Jason Punyon
+2  A: 

In SQL Server 2008, you can use the "EXCEPT" operator, which is used like a UNION but returns everything from the first query except where it is also in the second:

SELECT * FROM TABLA EXCEPT SELECT * FROM TABLEB

I understand that Oracle has a "MINUS" operator that does the same thing.

Guy Starbuck
+7  A: 
SELECT a.Name, a.Color
FROM a LEFT OUTER JOIN b ON (a.Name = b.Name AND a.Color = b.Color)
WHERE b.Name IS NULL AND b.Color IS NULL
Justin Balvanz
Why testing for both Name and Color being NULL? Surely testing for one or the other is enough.
Bill Karwin
+1  A: 

You can use the EXCEPT operator, which is the opposite of UNION. In Oracle, the equivalent is MINUS.

SELECT * FROM TABLE_A
EXCEPT
SELECT * FROM TABLE_B
JosephStyons
A: 

SELECT TableA.Name, TableA.Color FROM TableA WHERE TableA.Name + TableA.Color NOT IN (SELECT TableB.Name + TableB.Color FROM TableB)

Cynthia
A: 

insert into B select a.Name, a.Color from A a left join B b ON a.Name = b.Name And a.Color = b.Color where b.Color is null and b.Name is null

Orry
+2  A: 

I usually add a column "updated TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP", and I use its value to check when a new row is inserted or an existing one is modified.

In an application I developed I needed to solve a problem similar to yours, so I saved somewhere the MAX(updated) of B, and then with a query I identified all the rows where A.updated>B.updated, and the result are all new+modified rows.

Since the field default value is CURRENT_TIMESTAMP and it auto-updates "ON UPDATE" you never have to explicitly set its value.

Massimiliano Torromeo
+1  A: 

There are many correct answers up already, but I want to bring up a philosophical point:

Is this database schema really viable in a production environment inside a single schema?

Does it really make sense to have two tables containing data, and then writing a query to compare one to the other? I think it would make sense to have just a single table, and perhaps put in a date identifier to find records added after a certain point.

The only situation I can think of where you'd want to do this is where you have two separate databases and you want to "synchronize" them, or when you would want to find differences between the two, say, comparing a backup and production.

sheepsimulator
+1  A: 

In Oracle you would probably use:

MERGE INTO b USING
  (SELECT name, color 
     FROM a) src 
 ON (src.name = b.name AND color = src.color)
 WHEN NOT MATCHED THEN 
    INSERT (name, color)
    VALUES (src.name, src.color);

If your table has a primary key (do you really have tables without one?) like NAME, and you would like to INSERT or UPDATE depending on the existence of the record in table B, you would use:

MERGE INTO b USING 
  (SELECT name, color 
     FROM a) src 
  ON (src.name = b.name) 
  WHEN NOT MATCHED THEN 
    INSERT (name, color)
    VALUES (src.name, src.color) 
  WHEN MATCHED THEN 
    UPDATE 
    SET color = src.color;

I take it that SQL Server also has a MERGE statement or similar.

Melle