views:

55

answers:

2

I spent some time trying to figure out why this query isn't pulling the results i expected:

SELECT * FROM NGS WHERE ESPSSN NOT IN (SELECT SSN FROM CENSUS)

finally i tried writing the query another way and this ended up getting the expected results:

SELECT * FROM NGS n WHERE NOT EXISTS (SELECT * FROM CENSUS WHERE SSN = n.ESPSSN)

The first query seems more appropriate and "correct". I use "in" and "not in" all the time for similar selects and have never had a problem that i know of.

+5  A: 

If you write out the syntactic sugar, x not in (1,2,3) becomes:

x <> 1 AND x <> 2 AND x <> 3

So if the ssn column contains a null value, the first query is the equivalent of:

WHERE ESPSSN <> NULL AND ESPSSN <> ...

The result of the comparison with NULL is unknown, so the query would not return anything.

Andomar
Good call! I just checked and there is a null value in the table. Rewriting the query to SELECT * FROM NGS WHERE ESPSSN NOT IN (SELECT ISNULL(SSN,'') FROM CENSUS) gave me the expected value. I wonder which should i use?
Josh
@Josh: I default to `not exists` because it doesn't suffer from the `null` gotcha. But if you have performance issues, it can be worth examining the query plan for both options
Andomar
@Josh, @Andomar: The ISNULL will invalidate use of any index that may have been used before
gbn
+1  A: 

As Andomar said, beware of NULL values when using NOT IN

Also note that a query using the NOT IN predicate will always perform nested full table scans, whereas a query using NOT EXISTS can use an index within the sub-query, and be much faster as a result.

Vincent Buck
Good to know. Thanks.
Josh