views:

167

answers:

8

I've got the following SQL :

CREATE TABLE tbFoo(
    a varchar(50) NULL,
) 


CREATE NONCLUSTERED INDEX IX_tbFoo_a ON tbFoo
(
    a ASC
)
WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON)

insert into tbFoo select null
insert into tbFoo select 'test'

The two following queries work fine and use my index as expected :

select * from tbFoo where a='test'
select * from tbFoo where a is null

Now, let's pretend I want to store my comparison value in a variable, like this :

declare @a varchar(50)
select @a = NULL

The following query won't return the expected results if @a is null because I should use the "is" operator rather than "="

select * from tbFoo where a=@a

The following will work but will do a table scan if @a is null (because of the 'test' row which forces the evaluation of the second parenthesis)

select * from tbFoo where (a is null and @a is null) or (a=@a)

Eventually, I've came up with this solution, which works fine and uses my index :

select * from tbFoo where (a is null and @a is null) or (@a is not null and a=@a)

Is my analysis of the situation correct?

Is there a better way to handle this situation ?

+1  A: 

Nothing is ever "equal to" NULL... which is sort of the point of NULL.

Your solution will work fine. I'm surprised at how the query optimizer handles the shorter version. I would think that testing a for NULL before testing equality with a table scan would be a no-brainer.

richardtallent
Except when ansi_nulls off
Jeff O
@Guiness: good point! But in the sense of database *theory*, NULL is just NULL, even if the RDBMS lets you fudge that. For maximum maintainability, I think the OP's solution is preferable in most cases to forcing the database into a non-standard mode. Maybe if he was doing a few dozen similar comparisons within the same stored procedure, the pendulum would swing the opposite direction.
richardtallent
+3  A: 

Eventually, I came up with this solution, which works fine and uses my index :

In SQL Server 2008, you can define a filtered index based on a predicate that excludes NULLs:

CREATE UNIQUE NONCLUSTERED INDEX IX_tbFoo_a 
ON tbFoo (a)
WHERE a IS NOT NULL;
Vincent Buck
A: 

This what I do. It's very flexible. I am assuming that @a is an argument for the sproc. 'somethingweird' can be something that you will never see in your recordset '~~~' or whatever.

set @a = isnull(@a,'somethingweird')
select * from tbFoo where isnull(a,'somethingweird')=@a
Praesagus
That probably prevents the optimizer using an index on 'a'...
Jonathan Leffler
+1  A: 

another possibility is using setting ansi nulls to off

set ansi_nulls off

declare @a varchar(50)
select @a = NULL

select * from tbFoo where a=@a

set ansi_nulls on

Just keep in mind that you are breaking away from the default behavior here

SQLMenace
A: 

I don't have an instance at home to play with, but I can see the table scan getting very annoying. A possible alternative is to use UNION in place of the OR operator...

select * from tbFoo where (a is null and @a is null)
UNION ALL
select * from tbFoo where (a=@a and @a is not null)

(I'm not sure exactly what effect the "@a is not null" will have on performance, but my gut feel would be to include it. It's a constant expression that should allow the optimiser to know when the whole condition always fails. My technique is always to play and see what works best.)

I find this UNION trick to have two properties:
- It can significantly improve performance by simplfying queries
- It balloons code with multiple joins and cause major maintenance headaches

But then, life is jus a balancing act :)

Dems
A: 

Just ISNULL both sides like so...

DECLARE @random VARCHAR(50)
SELECT  @random = 'text that never appears in your table'

SELECT * FROM @tbFoo WHERE ISNULL(a, @random) = ISNULL(@a, @random)
Jon Erickson
I don't think this will use the indexes effectively. NULL and non-null values can be matched much more efficiently using an index than coalescing them to a constant and then comparing them.
richardtallent
+1  A: 

Perhaps your database engine optimizes what you've got automatically, but it seems to me like the following would be more efficient:

if @a IS NULL
    select * from tbFoo where a is null
else
    select * from tbFoo where a = @a

My reasoning for this is that you would perform the if @a IS NULL condition only once, rather than checking it for every line in the database. Again, though, a quality database engine should be able to convert your code into the same kind of data plan as this.

StriplingWarrior
A: 

Your analysis is correct - and is why 3-valued logic makes life difficult.

The suggestion from @StriplingWarrior is good; it finesses the problem by executing different SQL depending on whether the variable is null or not. Where that is not possible, your long-winded solution which repeatedly uses the host variable is necessary.

Jonathan Leffler