ansaurus

Question

"<>" vs "NOT IN"

Answer 1

+6 A:

<> is a "singular" NOT operation; NOT IN is a set operation, so it makes sense that the former wouldn't work. I have no idea whether or not it may have done so under a previous version of SQL Server, however.

Rob 2009-05-13 14:40:14

It never has been done any different in SQL Server.

Tomalak 2009-05-13 14:42:11

Answer 2

+4 A:

I'm not 100% sure but my first guess would be not <> is looking to only compare singles values where ad not in compares a value to a list of values.

DForck42 2009-05-13 14:41:09

Answer 3

+6 A:

SELECT something
FROM someTable
WHERE idcode NOT IN (SELECT ids FROM tmpIdTable)

checks against any value in the list.

However, the NOT IN is not NULL-tolerant. If the sub-query returned a set of values that contained NULL, no records would be returned at all. (This is because internally the NOT IN is optimized to idcode <> 'foo' AND idcode <> 'bar' AND idcode <> NULL etc., which will always fail because any comparison to NULL yields UNKNOWN, preventing the whole expression from ever becoming TRUE.)

A nicer, NULL-tolerant variant would be this:

SELECT something
FROM someTable
WHERE NOT EXISTS (SELECT ids FROM tmpIdTable WHERE ids = someTable.idcode)

EDIT: I initially assumed that this:

SELECT something
FROM someTable
WHERE idcode <> (SELECT ids FROM tmpIdTable)

would check against the first value only. It turns out that this assumption is wrong at least for SQL Server, where it actually triggers his error:

Msg 512, Level 16, State 1, Line 1
Subquery returned more than 1 value. This is not permitted when the subquery follows =, !=, <, <= , >, >= or when the subquery is used as an expression.

Tomalak 2009-05-13 14:43:46

That makes a lot of sense.

DJ 2009-05-13 14:48:57

Answer 4

+11 A:

try this, may run faster because of index usage:

SELECT something
FROM someTable
    LEFT OUTER JOIN tmpIdTable ON idcode=ids
WHERE ids IS NULL

KM 2009-05-13 14:45:13

+1 good alternative

Dead account 2009-05-13 14:48:10

Thats a good idea!

DJ 2009-05-13 14:53:46

SQL Server normally (I haven't seen a case when it hasn't) produces identical QEP's for a join or a sub-query, and therefore identical performance.

pipTheGeek 2009-05-13 15:50:38

That is the right answer +1

wcm 2009-05-22 18:44:11

Answer 5

+2 A:

I have no idea why would you write something like WHERE idcode <> (SELECT ids FROM tmpIdTable). A SELECT statement will return a set of tuples and your idcode either will or will NOT be IN this set. "WHERE idcode NOT IN (SELECT ids FROM tmpIdTable)" is the way to do it.

Peter Perháč 2009-05-13 14:45:17

Thats why i knew to fix it, I just didn't really know exactly what the logic behind it was.

DJ 2009-05-13 14:48:27

The <> statement may make sense if the sub-query is ordered in some way, e.g. "<> the largest [thing] in the list".

Tomalak 2009-05-13 14:48:30

You're right, in case the sub-query is ordered the <> notation could make some sense, but I would avoid it anyhow, as it's prone to error -- easy to overlook its intent. Whoever removes or changes the ordering of the sub-query later on may be baffled by why the query returns nothing.

Peter Perháč 2009-05-13 14:57:05

True. For disambiguation I would always put in a TOP 1 or MAX()/GROUP BY or similar in the sub-query. Just using <> against a set of values is too much side-effect dependent for my share.

Tomalak 2009-05-13 15:40:00

Hm... To err is human, I guess. :-\ I really thought that <> (SELECT ...) was valid, but at least SQL Server does not accept it when the sub-query returns more than a single value. It's no good practice anyway.

Tomalak 2009-05-13 15:51:14

Answer 6

+3 A:

in some versions of SQL != should be used for a "not equals" logical statement. Have you tried that?

Vuk 2009-05-13 14:51:30

'<>' and '!=' are equivalent on SQL Server. There is no version that insists on '!='. But '<>' is the ANSI SQL standard way of doing it, though personally I tend to use != more often for some reason.

Tomalak 2009-05-13 14:58:16

Answer 7

+3 A:

This code is valid if and only if there are no rows or a single row returned from tmpIdTable:

SELECT something
FROM someTable
WHERE idcode <> (SELECT ids FROM tmpIdTable)

If multiple rows are returned, you will get an error like:

Msg 512, Level 16, State 1, Line 1 Subquery returned more than 1 value. This is not permitted when the subquery follows =, !=, <, <= , >, >= or when the subquery is used as an expression.

This is the same error you get with nested scalar unexpectedly produces multiple rows like:

SELECT *, (SELECT blah FROM t1 WHERE etc.) FROM t2

Nothing has changed WRT this in SQL Server in a decade, so I expect assumptions about the nested query in the original code have been broken.

If no rows are returned, the result will be empty since <> NULL is never true (assume ANSI NULLs).

This code is valid for any number of rows:

SELECT something
FROM someTable
WHERE idcode NOT IN (SELECT ids FROM tmpIdTable)

However, there still can be issues with NULL.

Cade Roux 2009-05-13 15:48:19

I actually had this same conversation this morning with a coworker. Its similar, but not quite the answer.

DJ 2009-05-13 16:04:03

Conceptually, at least, it's even wrong if the subselect returns zero or one row. Because you're asking if a scalar, idcode, is equal to a zero- or one- element list. The scalar is never actually equal to the list, even if it's equal to the only element in a one-element list.

Carl Manaster 2009-05-13 16:46:57

Indeed, I never use the construct and I would typically think of it as a code smell.

Cade Roux 2009-05-13 21:44:21

Answer 8

+2 A:

If the SELECT subquery returns zero rows, that's a NULL. When NULL is compared to anything, the result is always UNKNOWN, and never TRUE. Confusingly enough, NOT UNKNOWN is equal to UNKNOWN.

I avoid three valued logic (TRUE, FALSE, UNKNOWN) whenever possible. It's not that hard to avoid once you get the hang of it.

If the SELECT subquery returns exactly one value, the comparison for inequality should return the result you expect.

If the SELECT subquery returns more than one value, you should get an error.

In general, NOT IN will return the result you expect when you are testing for non membership in a set.

This response overlaps other responses, but it's phrased a little differently.

Edited to add more detail about NOT IN:

I did some searching about NOT IN in Oracle, and I learned something I didn't know a half an hour ago. NOT IN is NULL sensitive. In particular,

X NOT IN (SELECT ...)

Is not the same as

NOT (X IN SELECT ...))

I may have to amend my earlier response!

Walter Mitty 2009-05-13 16:01:22

+1 for pointing out the TRUE/FALSE/UNKNOWN part. Thanks.

Tomalak 2009-05-13 16:41:59

Answer 9

+1 A:

queries using NOT IN may be brittle:

http://sqlblog.com/blogs/alexander_kuznetsov/archive/2008/10/21/defensive-database-programming-rewriting-queries-with-not-in.aspx

AlexKuznetsov 2009-05-18 20:06:09

ansaurus

tags:

views:

answers:

"<>" vs "NOT IN"

related questions