Update:
These articles in my blog describe the differences between the methods in more detail:
There are three ways to do such a query:
LEFT JOIN / IS NULL
:
SELECT *
FROM common
LEFT JOIN
table1 t1
ON t1.common_id = common.common_id
WHERE t1.common_id IS NULL
NOT EXISTS
:
SELECT *
FROM common
WHERE NOT EXISTS
(
SELECT NULL
FROM table1 t1
WHERE t1.common_id = common.common_id
)
NOT IN
:
SELECT *
FROM common
WHERE common_id NOT IN
(
SELECT common_id
FROM table1 t1
)
When table1.common_id
is not nullable, all these queries are semantically the same.
When it is nullable, NOT IN
is different, since IN
(and, therefore, NOT IN
) return NULL
when a value does not match anything in a list containing a NULL
.
This may be confusing but may become more obvious if we recall the alternate syntax for this:
common_id = ANY
(
SELECT common_id
FROM table1 t1
)
The result of this condition is a boolean product of all comparisons within the list. Of course, a single NULL
value yields the NULL
result which renders the whole result NULL
too.
We never cannot say definitely that common_id
is not equal to anything from this list, since at least one of the values is NULL
.
Suppose we have these data:
common
--
1
3
table1
--
NULL
1
2
LEFT JOIN / IS NULL
and NOT EXISTS
will return 3
, NOT IN
will return nothing (since it will always evaluate to either FALSE
or NULL
).
In MySQL
, in case on non-nullable column, LEFT JOIN / IS NULL
and NOT IN
are a little bit (several percent) more efficient than NOT EXISTS
. If the column is nullable, NOT EXISTS
is the most efficient (again, not much).
In Oracle
, all three queries yield same plans (an ANTI JOIN
).
In SQL Server
, NOT IN
/ NOT EXISTS
are more efficient, since LEFT JOIN / IS NULL
cannot be optimized to an ANTI JOIN
by its optimizer.
In PostgreSQL
, LEFT JOIN / IS NULL
and NOT EXISTS
are more efficient than NOT IN
, sine they are optimized to an Anti Join
, while NOT IN
uses hashed subplan
(or even a plain subplan
if the subquery is too large to hash)