tags:

views:

115

answers:

3

I'm performing a query that is looking for values in one table that are not in another. For example:

SELECT id FROM table1 WHERE id NOT IN (SELECT id FROM table2);

Both tables have about 1 million rows but only a few hundred that do not match values which are returned in the result set. The query takes about 35 seconds. Doing a show profile on the query shows that mysql is spending most of the time in the state of "preparing". Any ideas on how I can optimize this state or what is actually happening during "preparing"?

The id value in both tables is indexed and of the same type and size.

The entire profile of the query is:

+--------------------------------+----------+
| Status                         | Duration |
+--------------------------------+----------+
| (initialization)               | 0        | 
| checking query cache for query | 0        | 
| Opening tables                 | 0.13     | 
| System lock                    | 0        | 
| Table lock                     | 0        | 
| init                           | 0.01     | 
| optimizing                     | 0        | 
| statistics                     | 0        | 
| preparing                      | 0        | 
| executing                      | 0        | 
| Sending data                   | 0        | 
| optimizing                     | 0        | 
| statistics                     | 0        | 
| preparing                      | 34.83    | 
| end                            | 0        | 
| query end                      | 0        | 
| freeing items                  | 0        | 
| closing tables                 | 0        | 
| logging slow query             | 0        | 
+--------------------------------+----------+

Any tips are appreciated.

Thanks.

+1  A: 

I would left join table2 on id where id is null. This would give you a much faster return of the data.

select
     a.id
from
     table1 a 
     left join table2 b on a.id = b.id and b.id is null
Avitus
glad you chnaged that right join to a left!
Mitch Wheat
Should this be and WHERE b.id is null?
Michael Shnitzer
it is preferable to specify this in the where clause; but the optimiser should produce identical plans.
Mitch Wheat
`@Mitch Wheat`: it is required to specify this in the `WHERE` clause, otherwise the query will return wrong results (just all records from `a`)
Quassnoi
+1  A: 
SELECT id FROM table1
LEFT JOIN table2 ON table1.id = table2.id
WHERE table2.id IS NULL;
Mitch Wheat
The query still takes about 30 seconds but for this "show profile;" most of time is in "Sending Data". Is that just the time it takes for mysql to run the query or is there anything I can do to speed that up?
Michael Shnitzer
it will depend on what indexes you have defined...
Mitch Wheat
+1  A: 

There's nothing to optimize - NOT IN produces an equivalent query plan to LEFT JOIN/IS NULL in MySQL. Quote:

However, these three methods generate three different plans which are executed by three different pieces of code. The code that executes EXISTS predicate is about 30% less efficient than those that execute index_subquery and LEFT JOIN optimized to use Not exists method.

That’s why the best way to search for missing values in MySQL is using a LEFT JOIN / IS NULL or NOT IN rather than NOT EXISTS.

For more information, see NOT IN vs. NOT EXISTS vs. LEFT JOIN / IS NULL: MySQL

OMG Ponies
`table2.id` should be indexed of course to produce the equivalent query. But from its name (which sounds like a `PK`) and the fact that the query takes `35` seconds for `1,000,000` rows we can conclude that it is indexed.
Quassnoi