views:

2934

answers:

4

I have 2 tables (A and B) with the same primary keys. I want to select all row that are in A and not in B. The following works:

select * from A where not exists (select * from B where A.pk=B.pk);

however it seems quite bad (~2 sec on only 100k rows in A and 3-10k less in B)

Is there a better way to run this? Perhaps as a left join?

select * from A left join B on A.x=B.y where B.y is null;

On my data this seems to run slightly faster (~10%) but what about in general?

+7  A: 

I think your last statement is the best way. You can also try

select *
from A left join B on 
    A.x = B.y
    and B.y is null
Nick Berardi
That doesn't make any sense. When B.y is null, A.x=B.y will never be true. You'll get all rows of A, not just those that have no matching row in B.
Bill Karwin
@Bill and yet it works! Plus it is the exact same thing as the second statement listed above.
Nick Berardi
I changed and to where.
hamstar
+3  A: 

I use queries in the format of your second example. A join is usually more scalable than a correlated subquery.

Bill Karwin
+2  A: 

I also use left joins with a "where table2.id is null" type criteria.

Certainly seems to be more efficient than the nested query option.

Dave Rix
A: 

Joins are generally faster (in MySQL), but you should also consider your indexing scheme if you find that it's still moving slowly. Generally, any field setup as a foreign key (using INNODB) will already have an index set. If you're using MYISAM, make sure that any columns in the ON statement are indexed, and consider also using any columns in the WHERE clause.

Chosun