views:

461

answers:

9
+10  Q: 

"<>" vs "NOT IN"

I was debugging a stored procedure the other day and found some logic something like this:

SELECT something
FROM someTable
WHERE idcode <> (SELECT ids FROM tmpIdTable)

This returned nothing. I thought it looked a little odd with the "<>" so I changed it to "NOT IN" and then everything worked fine. I was wondering why this is? This is a pretty old proc and I am not really sure how long the issue has been around, but we recently switched from SQL Server 2005 to SQL Server 2008 when this was discovered. What is the real difference between "<>" and "NOT IN" and has the behavior changed between Server2005 and 2008?

+6  A: 

<> is a "singular" NOT operation; NOT IN is a set operation, so it makes sense that the former wouldn't work. I have no idea whether or not it may have done so under a previous version of SQL Server, however.

Rob
It never has been done any different in SQL Server.
Tomalak
+4  A: 

I'm not 100% sure but my first guess would be not <> is looking to only compare singles values where ad not in compares a value to a list of values.

DForck42
+6  A: 
SELECT something
FROM someTable
WHERE idcode NOT IN (SELECT ids FROM tmpIdTable)

checks against any value in the list.

However, the NOT IN is not NULL-tolerant. If the sub-query returned a set of values that contained NULL, no records would be returned at all. (This is because internally the NOT IN is optimized to idcode <> 'foo' AND idcode <> 'bar' AND idcode <> NULL etc., which will always fail because any comparison to NULL yields UNKNOWN, preventing the whole expression from ever becoming TRUE.)

A nicer, NULL-tolerant variant would be this:

SELECT something
FROM someTable
WHERE NOT EXISTS (SELECT ids FROM tmpIdTable WHERE ids = someTable.idcode)


EDIT: I initially assumed that this:

SELECT something
FROM someTable
WHERE idcode <> (SELECT ids FROM tmpIdTable)

would check against the first value only. It turns out that this assumption is wrong at least for SQL Server, where it actually triggers his error:

Msg 512, Level 16, State 1, Line 1
Subquery returned more than 1 value. This is not permitted when the subquery follows =, !=, <, <= , >, >= or when the subquery is used as an expression.
Tomalak
That makes a lot of sense.
DJ
+11  A: 

try this, may run faster because of index usage:

SELECT something
FROM someTable
    LEFT OUTER JOIN tmpIdTable ON idcode=ids
WHERE ids IS NULL
KM
+1 good alternative
Dead account
Thats a good idea!
DJ
SQL Server normally (I haven't seen a case when it hasn't) produces identical QEP's for a join or a sub-query, and therefore identical performance.
pipTheGeek
That is the right answer +1
wcm
+2  A: 

I have no idea why would you write something like WHERE idcode <> (SELECT ids FROM tmpIdTable). A SELECT statement will return a set of tuples and your idcode either will or will NOT be IN this set. "WHERE idcode NOT IN (SELECT ids FROM tmpIdTable)" is the way to do it.

Peter Perháč
Thats why i knew to fix it, I just didn't really know exactly what the logic behind it was.
DJ
The <> statement may make sense if the sub-query is ordered in some way, e.g. "<> the largest [thing] in the list".
Tomalak
You're right, in case the sub-query is ordered the <> notation could make some sense, but I would avoid it anyhow, as it's prone to error -- easy to overlook its intent. Whoever removes or changes the ordering of the sub-query later on may be baffled by why the query returns nothing.
Peter Perháč
True. For disambiguation I would always put in a TOP 1 or MAX()/GROUP BY or similar in the sub-query. Just using <> against a set of values is too much side-effect dependent for my share.
Tomalak
Hm... To err is human, I guess. :-\ I really thought that <> (SELECT ...) was valid, but at least SQL Server does not accept it when the sub-query returns more than a single value. It's no good practice anyway.
Tomalak
+3  A: 

in some versions of SQL != should be used for a "not equals" logical statement. Have you tried that?

Vuk
'<>' and '!=' are equivalent on SQL Server. There is no version that insists on '!='. But '<>' is the ANSI SQL standard way of doing it, though personally I tend to use != more often for some reason.
Tomalak
+3  A: 

This code is valid if and only if there are no rows or a single row returned from tmpIdTable:

SELECT something
FROM someTable
WHERE idcode <> (SELECT ids FROM tmpIdTable)

If multiple rows are returned, you will get an error like:

Msg 512, Level 16, State 1, Line 1 Subquery returned more than 1 value. This is not permitted when the subquery follows =, !=, <, <= , >, >= or when the subquery is used as an expression.

This is the same error you get with nested scalar unexpectedly produces multiple rows like:

SELECT *, (SELECT blah FROM t1 WHERE etc.) FROM t2

Nothing has changed WRT this in SQL Server in a decade, so I expect assumptions about the nested query in the original code have been broken.

If no rows are returned, the result will be empty since <> NULL is never true (assume ANSI NULLs).

This code is valid for any number of rows:

SELECT something
FROM someTable
WHERE idcode NOT IN (SELECT ids FROM tmpIdTable)

However, there still can be issues with NULL.

Cade Roux
I actually had this same conversation this morning with a coworker. Its similar, but not quite the answer.
DJ
Conceptually, at least, it's even wrong if the subselect returns zero or one row. Because you're asking if a scalar, idcode, is equal to a zero- or one- element list. The scalar is never actually equal to the list, even if it's equal to the only element in a one-element list.
Carl Manaster
Indeed, I never use the construct and I would typically think of it as a code smell.
Cade Roux
+2  A: 

If the SELECT subquery returns zero rows, that's a NULL. When NULL is compared to anything, the result is always UNKNOWN, and never TRUE. Confusingly enough, NOT UNKNOWN is equal to UNKNOWN.

I avoid three valued logic (TRUE, FALSE, UNKNOWN) whenever possible. It's not that hard to avoid once you get the hang of it.

If the SELECT subquery returns exactly one value, the comparison for inequality should return the result you expect.

If the SELECT subquery returns more than one value, you should get an error.

In general, NOT IN will return the result you expect when you are testing for non membership in a set.

This response overlaps other responses, but it's phrased a little differently.

Edited to add more detail about NOT IN:

I did some searching about NOT IN in Oracle, and I learned something I didn't know a half an hour ago. NOT IN is NULL sensitive. In particular,

X NOT IN (SELECT ...)

Is not the same as

NOT (X IN SELECT ...))

I may have to amend my earlier response!

Walter Mitty
+1 for pointing out the TRUE/FALSE/UNKNOWN part. Thanks.
Tomalak