ansaurus

Question

SQL "select where not in subquery" returns no results

Answer 1

A:

Table1 or Table2 has some null values for common_id. Use this query instead:

select *
from Common
where common_id not in (select common_id from Table1 where common_id is not null)
and common_id not in (select common_id from Table2 where common_id is not null)

Jeremy Stein 2009-09-10 16:08:35

What if there's data in one table but not the other? Do you want "and" or "or" there?

Philip Kelley 2009-09-10 16:30:44

I'm looking for records not referenced in any table, so I want AND. I'll clarify the question.

Jeremy Stein 2009-09-10 16:35:08

Answer 2

+3 A:

Just off the top of my head...

select c.commonID, t1.commonID, t2.commonID
from Common c
     left outer join Table1 t1 on t1.commonID = c.commonID
     left outer join Table2 t2 on t2.commonID = c.commonID
where t1.commonID is null 
     and t2.commonID is null

I ran a few tests and here were my results w.r.t. @patmortech's answer and @rexem's comments.

If either Table1 or Table2 is not indexed on commonID, you get a table scan but @patmortech's query is still twice as fast (for a 100K row master table).

If neither are indexed on commonID, you get two table scans and the difference is negligible.

If both are indexed on commonID, the "not exists" query runs in 1/3 the time.

Austin Salonen 2009-09-10 16:20:46

That should be an AND in the where clause. Otherwise, that works.

Jeremy Stein 2009-09-10 16:33:38

changed per your comment. The "or" picks out orphans in either table.

Austin Salonen 2009-09-10 16:35:53

That's better. By the way, is there some reason I should use outer joins rather than the subquery?

Jeremy Stein 2009-09-10 16:37:21

Readability is primary one. I suspect a better execution plan would be generated but without a query plan, I can't confirm.

Austin Salonen 2009-09-10 16:57:53

This approach is worse that using NOT EXISTS - the join results in fetching more rows than it needs, then the results compared for the columns being null. And NOT EXISTS is more readable to boot.

OMG Ponies 2009-09-10 17:29:11

@rexem - you can't make that claim without an execution plan.

David B 2009-09-10 17:31:56

@David: Think about what the query has to do to produce the results. It's OK to think for yourself than to hide behind the execution plan defense. An execution plan is only as good as the data model.

OMG Ponies 2009-09-10 17:36:53

@rexem -- you've argued a point that should now be an answer.

Austin Salonen 2009-09-10 17:44:07

@Austin: That doesn't stop hack answers like these from perpetuating themselves. Don't get me wrong - your answer works, but it's in the realm of putting SELECTs in the SELECT clause.

OMG Ponies 2009-09-10 17:49:27

@rexem -- point conceded. But you could add value to the question by posting a well-explained response.

Austin Salonen 2009-09-10 17:51:40

I tried both ways and, for my data, the subqueries were 5-10% faster.

Jeremy Stein 2009-09-10 17:52:24

Also note that the top-rated answer is **exactly the same** as mine...

Austin Salonen 2009-09-10 17:52:35

@Austin: I've added my comment to that one too. Sorry if you felt you were being made a target.

OMG Ponies 2009-09-10 17:58:17

@rexem -- I still would like to see your answer explained on something I could vote and give you credit for. I used @patmortech's answer and compared with mine and it resulted in nearly identical query plans WRT cost. Both did table scans of T1 his 49%.

Austin Salonen 2009-09-10 18:04:19

@Austin: Appreciated, but patmortech's answer was in before I found the question.

OMG Ponies 2009-09-10 18:11:38

It may have been but it does little to explain why it's better than an alternative.

Austin Salonen 2009-09-10 18:15:10

for most of the stuff i've done using the joins with the where clause were faster than the sub queries. *shrugs*.

DForck42 2009-09-10 20:08:52

Answer 3

+2 A:

SELECT T.common_id
  FROM Common T
       LEFT JOIN Table1 T1 ON T.common_id = T1.common_id
       LEFT JOIN Table2 T2 ON T.common_id = T2.common_id
 WHERE T1.common_id IS NULL
   AND T2.common_id IS NULL

najmeddine 2009-09-10 16:25:36

This approach is worse that using NOT EXISTS - the join results in fetching more rows than it needs, then the results compared for the columns being null. It works, but the performance will not be as good - possibly worse that using IN with correlated subqueries.

OMG Ponies 2009-09-10 17:50:47

Answer 4

+2 A:

select *
from Common c
where not exists (select t1.commonid from table1 t1 where t1.commonid = c.commonid)
and not exists (select t2.commonid from table2 t2 where t2.commonid = c.commonid)

patmortech 2009-09-10 16:37:05

+1 -- see my answer for why.

Austin Salonen 2009-09-10 18:32:52

Answer 5

+2 A:

If you want the world to be a two-valued boolean place, you must prevent the null (third value) case yourself.

Don't write IN clauses that allow nulls in the list side. Filter them out!

common_id not in
(
  select common_id from Table1
  where common_id is not null
)

David B 2009-09-10 17:29:24

You misunderstood the question - re-read.

OMG Ponies 2009-09-10 17:52:08

nulls in the in-clause-list are a common reason for missing query results.

David B 2009-09-11 13:01:35

Actually, I think David B understood the question better than anyone else. I wanted to know WHY no results were returned. Upvoted.

Jeremy Stein 2009-09-11 14:22:20

Answer 6

A:

Let's suppose these values for common_id:

Common - 1
Table1 - 2
Table2 - 3, null

We want the row in Common to return, because it doesn't exist in any of the other tables. However, the null throws in a monkey wrench.

With those values, the query is equivalent to:

select *
from Common
where 1 not in (2)
and 1 not in (3, null)

That is equivalent to:

select *
from Common
where not (1=2)
and not (1=3 or 1=null)

This is where the problem starts. When comparing with a null, the answer is unknown. So the query reduces to

select *
from Common
where not (false)
and not (false or unkown)

false or unknown is unknown:

select *
from Common
where true
and not (unknown)

true and not unkown is also unkown:

select *
from Common
where unknown

The where condition does not return records where the result is unkown, so we get no records back.

One way to deal with this is to use the exists operator rather than in. Exists never returns unkown because it operates on rows rather than columns. (A row either exists or it doesn't; none of this null ambiguity at the row level!)

select *
from Common
where not exists (select common_id from Table1 where common_id = Common.common_id)
and not exists (select common_id from Table2 where common_id = Common.common_id)

Jeremy Stein 2009-09-11 14:12:43

Answer 7

+6 A:

Update:

These articles in my blog describe the differences between the methods in more detail:

There are three ways to do such a query:

LEFT JOIN / IS NULL:

SELECT  *
FROM    common
LEFT JOIN
        table1 t1
ON      t1.common_id = common.common_id
WHERE   t1.common_id IS NULL

NOT EXISTS:

SELECT  *
FROM    common
WHERE   NOT EXISTS
        (
        SELECT  NULL
        FROM    table1 t1
        WHERE   t1.common_id = common.common_id
        )

NOT IN:

SELECT  *
FROM    common
WHERE   common_id NOT IN
        (
        SELECT  common_id
        FROM    table1 t1
        )

When table1.common_id is not nullable, all these queries are semantically the same.

When it is nullable, NOT IN is different, since IN (and, therefore, NOT IN) return NULL when a value does not match anything in a list containing a NULL.

This may be confusing but may become more obvious if we recall the alternate syntax for this:

common_id = ANY
(
SELECT  common_id
FROM    table1 t1
)

The result of this condition is a boolean product of all comparisons within the list. Of course, a single NULL value yields the NULL result which renders the whole result NULL too.

We never cannot say definitely that common_id is not equal to anything from this list, since at least one of the values is NULL.

Suppose we have these data:

common

--
1
3

table1

--
NULL
1
2

LEFT JOIN / IS NULL and NOT EXISTS will return 3, NOT IN will return nothing (since it will always evaluate to either FALSE or NULL).

In MySQL, in case on non-nullable column, LEFT JOIN / IS NULL and NOT IN are a little bit (several percent) more efficient than NOT EXISTS. If the column is nullable, NOT EXISTS is the most efficient (again, not much).

In Oracle, all three queries yield same plans (an ANTI JOIN).

In SQL Server, NOT IN / NOT EXISTS are more efficient, since LEFT JOIN / IS NULL cannot be optimized to an ANTI JOIN by its optimizer.

In PostgreSQL, LEFT JOIN / IS NULL and NOT EXISTS are more efficient than NOT IN, sine they are optimized to an Anti Join, while NOT IN uses hashed subplan (or even a plain subplan if the subquery is too large to hash)

Quassnoi 2009-09-11 14:48:40

Great answer! Thanks!

FailBoy 2009-09-28 07:14:01

Great explanation!

lud0h 2009-10-09 11:51:26

ansaurus

tags:

views:

answers:

SQL "select where not in subquery" returns no results

related questions