views:

67

answers:

4

I need to copy over rows from Table B to Table A. The requirement is to only insert rows that are not already in A.

My question is, which is of the the following two is more efficient:

A)

   INSERT INTO A (x, y, z)
   SELECT x, y, z
   FROM B b
   WHERE b.id NOT IN (SELECT id FROM A);

B)

   INSERT INTO A (x, y, z)
   SELECT b.x, b.y, b.z
   FROM B b LEFT OUTER JOIN A a
     ON b.id = a.id
   WHERE a.id is NULL;

I am assuming the answer depends upon the size of the tables. But I wanted to know if there is something glaringly obvious about using one approach over the other.

To reduce the vagueness, lets say Table B will have less than 50K rows, and Table A will always be equal to or greater in size to Table B by a factor of 1-5.

If anyone has any other more efficient ways to do this, do tell.

+4  A: 

To add another option:

INSERT INTO A (x, y, z)
SELECT B.x, B.y, B.z
FROM B
WHERE NOT EXISTS(SELECT * FROM A WHERE A.id = B.id)

I usually go with the LEFT JOIN approach. But, if you want to know truly what is most efficient, run some tests on your environment. See what the execution plans for each approach are (you may find that multiple approaches actually result in the same execution plan).

AdaTheDev
+1, the best answer IMHO. I'd use exists
gbn
A: 

It shouldn't matter - a good optimizer will treat these identically. In practice, I have seen to quirky execution plans in exactly this case, but I have been known to use both styles interchangeably, depending on mood, readability and complexity of the query.

In SQL Server, option A is not available when you need to JOIN on a tuple of more thana a single column without using some kind of concatenation workaround (which I do not recommend), which brings us to cat-skinning option C (which I also use, expecially with the joins are really squirrely), which extends to tuples directly:

INSERT INTO A (x, y, z) 
SELECT x, y, z 
FROM B b 
WHERE NOT EXISTS (SELECT * FROM A WHERE id = b.id); 

INSERT INTO A (x, y, z) 
SELECT x, y, z 
FROM B b 
WHERE NOT EXISTS (SELECT * FROM A WHERE id1 = b.id1 AND id2 = b.id2);
Cade Roux
A: 

I think option B is better, especially if Table A is bigger than Table B by a factor > 1.

If you have indexes on a.id and b.id then joining will be faster, IMHO, than using where for each row...

Leon
But it depends on the optimiser - if the optimiser does a good job, they'll probably come out the same
AdaTheDev
I agree about the optimizer, but it wouldn't hurt to help him a little bit :)
Leon
A: 

Depending on the number of rows and the activity on the database, it would help a lot to drop all indexes on the table before the insert and recreate them afterwards.

edoloughlin