views:

1381

answers:

4

I'm using MS Sql 2005.

Why does this give me the correct results (returns 169 rows)...

select
  * 
from 
  [import_Data] 
where 
  [import_Data].name not in 
  (
    select 
      [import_Data].name 
    from 
      [import_Data] 
      inner join [resource] on [import_Data].name = [resource].name
    where 
      [import_Data].ProviderTypeID = 4 
      and [resource].IsDeleted = 0
  )
  and [import_Data].ProviderTypeID = 4

But this doesn't (returns 0 rows)...

select 
  * 
from 
  [import_Data] 
where 
  [import_Data].name not in 
  (
    select 
      [resource].name 
    from 
      [resource] 
    where 
      IsDeleted = 0
  ) 
  and [import_Data].ProviderTypeID = 4

The only difference between the name columns is that [resource].name is varchar(500) and [import_Data].name is varchar(300).

+2  A: 

You probably have a null in the name column of the [resource] table somewhere. The behaviour of = and not in are different when nulls are involved.

Try:

select * from [resource] where name is null and IsDeleted = 0

to narrow it down.

Or to fix:

select * from [import_Data]
where 
  [import_Data].name not in (
    select name from [resource] where IsDeleted = 0 and name is not null
    ) 
  and [import_Data].ProviderTypeID = 4

Also see http://stackoverflow.com/questions/129077/sql-not-in-constraint-and-null-values

xahtep
+4  A: 

My guess is that you face a difference in how the IN operator and the = operator work internally.

How about this:

select 
  * 
from 
  [import_Data] 
where 
  not exists 
  (
    select 1 
    from   [resource] 
    where  name = [import_Data].Name and IsDeleted = 0
  ) 
  and [import_Data].ProviderTypeID = 4
Tomalak
It was definitely an issue with one of the values for [resource].name in my database being null. As soon as I corrected for that it fixed the problem.
Aaron Palmer
Yes, it's clear to me now as well. The NOT EXISTS correlated sub-query is still more recommendable from my POV, I hardly ever us NOT IN with sub-queries.
Tomalak
+4  A: 

My guess is that there is a null resource.name in your resource table which is throwing all the comparisons off. Why do nulls cause a problem? Per "Guru's Guid to TSQL" and I'm paraphrasing "ANSI guidelines state that an expression that compares a value for equality to NULL always returns NULL." So any null in the list throws the whole thing off.

In your first query your inner join is excluding those nulls.

So you have three options

  • Don't use NOT IN, use NOT EXISTS
  • Use ISNULL on resource.name to eliminate the nulls
  • Change the null handling behaviour by setting ANSI_NULLS OFF.

An example of not exists using a correlated subquery (warning aircode)

SELECT *
FROM [import_Data] 
WHERE NOT EXISTS(
        select [resource].name from [resource] where IsDeleted = 0 AND [resource].name = [import_Data].name
    )
  AND [import_Data].ProviderTypeID = 4
Will Rickards
@Stanislas Biron's answer has the missing part. Only "NOT IN" fails on NULL in the list. "IN" works fine with NULL in the list. +1 still.
Tomalak
+2  A: 

In fact the problem is that it looks like there are NULL values in your [resource].name column.

The "NOT IN" t-sql clause will not work when there are NULL values in the provided list of values.

This is because your where will resolve as :

[import_Data].name <> 'Resource1' and [import_Data].name <> 'Resource2' 
and [import_Data].name <> null

when the query engine resolves the [import_Data].name <> null, it returns an UNKNOWN value and the expression always evaluates to false. Hence you never get any row.

For example, you will always get 0 rows if you execute

select * from [import_Data] where [import_Data].name <> null

So, considering this, the following query will probably work in your case:

select * from [import_Data] 
where [import_Data].name not in (
    select [resource].name from [resource] where IsDeleted = 0 
    and [resource].name is not null
) and [import_Data].ProviderTypeID = 4
Stanislas Biron