views:

588

answers:

6

I was reading this article: http://stackoverflow.com/questions/191640/get-null-null-in-sql

And the consensus is that when trying to test equality between two (nullable) sql columns, the right approach is:

where ((A=B) OR (A IS NULL AND B IS NULL))

When A and B are NULL, (A=B) still returns FALSE, since NULL is not equal to NULL. That is why the extra check is required.

What about when testing inequalities? Following from the above discussion, it made me think that to test inequality I would need to do something like:

WHERE ((A <> B) OR (A IS NOT NULL AND B IS NULL) OR (A IS NULL AND B IS NOT NULL))

However, I noticed that that is not necessary (at least not on informix 11.5), and I can just do:

where (A<>B)

If A and B are NULL, this returns FALSE. If NULL is not equal to NULL, then shouldn't this return TRUE?

EDIT
These are all good answers, but I think my question was a little vague. Allow me to rephrase:

Given that either A or B can be NULL, is it enough to check their inequality with

where (A<>B)

Or do I need to explicitly check it like this:

WHERE ((A <> B) OR (A IS NOT NULL AND B IS NULL) OR (A IS NULL AND B IS NOT NULL))

REFER to this thread for the answer to this question.

+2  A: 

The short answer is... NULLs are weird, they don't really behave like you'd expect.

Here's a great paper on how NULLs work in SQL. I think it will help improve your understanding of the topic. I think the sections on handling null values in expressions will be especially useful for you.

http://www.oracle.com/technology/oramag/oracle/05-jul/o45sql.html

Robert Greiner
+6  A: 

All comparisons involving null are undefined, and evaluate to false. This idea, which is what prevents null being evaluated as equivalent to null, also prevents null being evaluated as NOT equivalent to null.

Adam Robinson
Not sure who downvoted this. Any comparison involving `null`s other than `is` WILL return false on a DB that follows the SQL standard.
Donnie
@Donnie: Thanks, I was wondering where I was wrong.
Adam Robinson
I did. any comparison involving null will return NULL
just somebody
@just somebody: Yes, the "value" of the comparison is null, but since that "value" can never be truly inspected, its practical valuation is false.
Adam Robinson
It's so much easier to figure out what a given comparsion is going to do, though, if you know the underlying rule: Anything involving a null operand, apart from the special `IS NULL` and `IS NOT NULL` tests, produces NULL.
Jason Orendorff
@Jason: I fail to see how the statement that "a comparison involving one or more `null`s always evaluates to null" is more meaningful (let alone helpful) than "a comparison involving one or more `null`s always evaluates to false". You have to make the next step of "because it evaluates to null, it *does not evaluate to true*, making is--essentially--false". Please explain how this is clearer.
Adam Robinson
+4  A: 

relational expressions involving NULL actually yield NULL again

edit

here, <> stands for arbitrary binary operator, NULL is the SQL placeholder, and value is any value (NULL is not a value):

  • NULL <> value -> NULL
  • NULL <> NULL -> NULL

the logic is: NULL means "no value" or "unknown value", and thus any comparison with any actual value makes no sense.

is X = 42 true, false, or unknown, given that you don't know what value X holds? SQL says it's unknown. is X = Y true, false, or unknown, given that you don't know either value? SQL says it's unknown again. and it says so for any binary relational operation, which is only logical (even if having NULLs in the model is not in the first place).

SQL also provides two unary postfix operators, IS NULL and IS NOT NULL, these return TRUE or FALSE according to the value of their operand. NULL IS NULL -> TRUE NULL IS NOT NULL -> TRUE

just somebody
+18  A: 

Because that behavior follows established ternary logic where NULL is considered an unknown value.

If you think of NULL as unknown, it becomes much more intuitive:

Is unknown a equal to unknown b? There's no way to know, so: unknown.

Ben S
Great perspective! Another way to look at it is that if "unknown a" equals "unknown b", then they can't really be that unknown, since you know something about them. Kind of quantum mechanic-y.
womp
@womp: Indeed, you know that you don't know ;)
Adam Robinson
+2  A: 

The default (ANSI) behaviour of nulls within an expression will result in a null (there are enough other answers with the cases of that).

There are however some edge cases and caveats that I would place when dealing with MS Sql Server that are not being listed.

  • Nulls within a statement that is grouping values together will be considered equal and be grouped together.
  • Null values within a statement that is ordering them will be considered equal.
  • Null values selected within a statement that is using distinct will be considered equal when evaluating the distinct aspect of the query

It is possible in SQL Server to override the expression logic regarding the specific Null = Null test, using the SET ANSI_NULLS OFF, which will then give you equality between null values - this is not a recommended move, but does exist.

SET ANSI_NULLS OFF

select result =
    case
     when  null=null then 'eq' 
     else 'ne'
    end

SET ANSI_NULLS ON

select result =
    case
     when  null=null then 'eq' 
     else 'ne'
    end
Andrew
A: 

"Is unknown a equal to unknown b? There's no way to know, so: unknown."

The question was : why does the comparison yield FALSE ?

Given three-valued logic, it would indeed be sensible for the comparison to yield UNKNOWN (not FALSE). But SQL does yield FALSE, and not UNKNOWN.

One of the myriads of perversities in the SQL language.

Furthermore, the following must be taken into account :

If "unkown" is a logical value in ternary logic, then it ought to be the case that an equality comparison between two logical values that both happen to be (the value for) "unknown", then that comparison ought to yield TRUE.

If the logical value is itself unknown, then obviously that cannot be represented by putting the value "unknown" there, because that would imply that the logical value is known (to be "unknown"). That is, a.o., how relational theory proves that implementing 3-valued logic raises the requirement for a 4-valued logic, that a 4 valued logic leads to the need for a 5-valued logic, etc. etc. ad infinitum.

Erwin Smout