views:

104

answers:

6

So I have a simple query that returns a listing of products

SELECT     Model, CategoryID
FROM         Products
WHERE     (Model = '010-00749-01') 

This returns

010-00749-01    00000000-0000-0000-0000-000000000000
010-00749-01    NULL

Whitch is correct, so I wanted only get the products whose CategoryID is not '00000000-0000-0000-0000-000000000000' so I have

SELECT     Model, CategoryID
FROM         Products
WHERE     (Model = '010-00749-01') 
AND (CategoryID <> '00000000-0000-0000-0000-000000000000') 

But this returns no results. So I changed the query to

SELECT     Model, CategoryID
FROM         Products
WHERE     (Model = '010-00749-01') 
AND ((CategoryID <> '00000000-0000-0000-0000-000000000000') OR  (CategoryID  IS NULL))

Which returns expected result

010-00749-01    NULL

Can someone explain this behavior to me? MS SQL Server 2008

+2  A: 

Check out the full reference on Books Online - by default ANSI_NULLS is on meaning you'd need to use the approach you have done. Otherwise, you could switch that setting OFF at the start of the query to switch the behaviour round.

When SET ANSI_NULLS is ON, a SELECT statement that uses WHERE column_name = NULL returns zero rows even if there are null values in column_name. A SELECT statement that uses WHERE column_name <> NULL returns zero rows even if there are nonnull values in column_name.
...
When SET ANSI_NULLS is ON, all comparisons against a null value evaluate to UNKNOWN. When SET ANSI_NULLS is OFF, comparisons of all data against a null value evaluate to TRUE if the data value is NULL.

Here's a simple example to demonstrate the behaviour with regard to comparisons against NULL:

-- This will print TRUE
SET ANSI_NULLS OFF;
IF NULL <> 'A'
    PRINT 'TRUE'
ELSE
    PRINT 'FALSE'

-- This will print FALSE
SET ANSI_NULLS ON;
IF NULL <> 'A'
    PRINT 'TRUE'
ELSE
    PRINT 'FALSE'
AdaTheDev
Be careful, setting ANSI_NULLS to `off` will decrease performance and increase logical reads. Turning it on affects the query plan.
vol7ron
@Vol7ron Really merited a downvote?! That v harsh for what is a correct answer. If you honestly feel it warrants a downvote, fair enough but imho seems odd and not an honest vote
AdaTheDev
What merited a downvote? I don't think it merits a down or an up. It's an answer. I wouldn't go so far to say it's "correct" either. Implementing this will take a performance hit, but more importantly it is technically not SQL standard, the moment you implement that, you are straying from ISO. One of the best things about SQL is the ability to go from platform to platform, you're giving that up here. `Coalesce` is a better solution. As I've stated, if you're worried about the performance impact then index that field w/ the function, otherwise clean up the underlying data.
vol7ron
.... or use the proper `IS NULL` or `IS NOT NULL` syntax.
vol7ron
A: 

Null gets special treatment. You need to explicitly test for null. See http://msdn.microsoft.com/en-us/library/ms188795.aspx

Michael Levy
A: 

look at this:

1=1        --true
1=0        --false
null=null  --false
null=1     --false

1<>1       --false
1<>0       --true
null<>null --false
null<>1    --false    <<<--why you don't get the row with: AND (CategoryID <> '00000000-0000-0000-0000-000000000000') 
KM
Technically SQL uses a three-valued logic system, so all of your comparisons with NULL yield UNKNOWN rather than FALSE. See: [SQL and the Snare of Three-Valued Logic](http://www.simple-talk.com/sql/learn-sql-server/sql-and-the-snare-of-three-valued-logic/).
Joe Stefanelli
call it `false` or `unknown`, it doesn't really matter, because the row is not included in the result set, which is what the question was all about.
KM
A: 

Basically, a NULL is the absence of any value. So trying to compare the NULL in CategoryId to a varchar value in the query will always result in a false evaluation.

You might want to try using the COALESCE function, something like:

SELECT     ModelId, CategoryID 
FROM       Products 
WHERE      (ModelId = '010-00749-01')  
AND        ( COALESCE( CategoryID, '' ) <> '00000000-0000-0000-0000-000000000000' ) 

EDIT

As noted by AdaTheDev the COALESCE function will negate any indices that may exist on the CategoryID column, which can affect the query plan and performance.

Bob Mc
Be careful with this approach, as the COALESCE could result in a less optimal execution plan by preventing an index seek on that column
AdaTheDev
Ada is correct, the COALESCE does have an effect on the execution plan. Will edit my answer to include.
Bob Mc
+1  A: 

You may try using the Coalesce function to set a default value for fields that have null:

   SELECT    Model , CategoryID
   FROM      Products
   WHERE     Model = '010-00749-01'
     AND     Coalesce(CategoryID,'') <> '00000000-0000-0000-0000-000000000000'

I think the problem lies in your understanding of NULL which basically means "nothing." You can't compare anything to nothing, much like you can't divide a number by 0. It's just rules of math/science.

Edit: As Ada has pointed out, this could cause an indexed field to no longer use an index.

Solution:

  • You can create an index using the coalesce function: eg create index ... coalesce(field)
  • You can add a not null constraint to prevent NULLs from ever appearing
  • A de facto standard of mine is to always assign default values and never allow nulls
vol7ron
Be careful with this approach, as the COALESCE could result in a less optimal execution plan by preventing an index seek on that column
AdaTheDev
This could be true, however, you can create an index as `coalesce(field)`, or add a not null constraint to prevent NULLs from ever appearing. A de facto standard of mine is to always assign default values and never allow nulls.
vol7ron
A: 

In general, you have to remember that NULL generally means UNKNOWN. That means if you say CategoryID <> '00000000-0000-0000-0000-000000000000' you have to assume that the query will only return values that it KNOWS will meet your criteria. Since there is a NULL (UNKNOWN) result, it does not actually know if that record meets your criteria and therefore will not be returned in the dataset.

Joe Philllips