ansaurus

Question

Why is my SQL 'NOT IN' clause producing different results from 'NOT EXISTS'

Answer 1

+4 A:

The NOT IN form works differently for NULLs. The presence of a single NULL will cause the entire statement to fail, thus returning no results.

So you have at least one event_id in locations that is NULL.

Also, your query might be better written as a join:

SELECT 
    COUNT(DISTINCT e.event_id)    
FROM
    events AS e  
    LEFT JOIN locations AS l ON e.event_id = l.event_id
WHERE
    l.event_id IS NULL

[UPDATE: apparently, the NOT EXISTS version is faster.]

Mitch Wheat 2010-06-04 11:16:31

+1 Thanks. Out of interest why would it be better written as a join? The Joined version is marginally slower for my dataset but both the exists and the join are treated in a very similar fashion according to the execution plan but the JOIN does a Merge Join (Left Outer) and the EXISTS does a Merge Join (Left Anti Semi)

Steve Weet 2010-06-04 11:51:11

@Steve It wouldn't. See http://sqlinthewild.co.za/index.php/2010/03/23/left-outer-join-vs-not-exists/

Martin Smith 2010-06-04 12:02:57

Thanks Martin. That was an excellent article.

Steve Weet 2010-06-04 12:21:50

Wow!, Thanks, Martin. I'll have to revise what I know, and do a spot of testing in SQL Server 2008 R2.

Mitch Wheat 2010-06-04 12:32:21

Is WHERE NOT EXISTS (SELECT 1 FROM locations ...) better than using '*' , or the same plan?

Mitch Wheat 2010-06-04 12:34:18

I'm not so sure about joining on the char(4) columns. What happens when using a 'usual' join on int columns?

Mitch Wheat 2010-06-04 12:36:31

@Mitch - RE: `SELECT 1` That's something I'm not sure about. I believe this used to make a difference but now they will use exactly the same plan but I have noticed Phil Factor still uses it (e.g. http://www.simple-talk.com/community/blogs/philfactor/archive/2009/08/03/74227.aspx). I'm not sure if this is out of habit or for some other reason.

Martin Smith 2010-06-04 12:40:50

Answer 2

+3 A:

You have a NULL in the subquery SELECT l.event_id FROM locations AS l so NOT IN will always evaluate to unknown and return 0 results

SELECT COUNT(DISTINCT e.event_id) 
FROM   events AS e
WHERE  e.event_id NOT IN (SELECT  l.event_id FROM locations AS l)

The reason for this behaviour can be seen from the below example.

'x' NOT IN (NULL,'a','b')

≡ 'x' <> NULL and 'x' <> 'a' and 'x' <> 'b'

≡ Unknown and True and True

≡ Unknown

Martin Smith 2010-06-04 11:17:18

Answer 3

A:

In and Exists are processed very very differently.

Select * from T1 where x in ( select y from T2 )

is typically processed as:

select * 
  from t1, ( select distinct y from t2 ) t2
 where t1.x = t2.y;

The subquery is evaluated, distinct'ed, indexed (or hashed or sorted) and then joined to the original table -- typically.

As opposed to

select * from t1 where exists ( select null from t2 where y = x )

That is processed more like:

for x in ( select * from t1 )
   loop
      if ( exists ( select null from t2 where y = x.x )
      then 
         OUTPUT THE RECORD
      end if
   end loop

It always results in a full scan of T1 whereas the first query can make use of an index on T1(x).

Bharat 2010-06-04 11:25:49

In my particular case the query optimizer uses exactly the same query plan for both queries.

Steve Weet 2010-06-04 11:55:17

ansaurus

tags:

views:

answers:

Why is my SQL 'NOT IN' clause producing different results from 'NOT EXISTS'

related questions