tags:

views:

159

answers:

5

Is the following the most efficient in SQL to achieve its result:

SELECT * 
  FROM Customers 
 WHERE Customer_ID NOT IN (SELECT Cust_ID FROM SUBSCRIBERS)

Could some use of joins be better and achieve the same result?

+1  A: 

Maybe try this

Select cust.*

From dbo.Customers cust
Left Join dbo.Subscribers subs on cust.Customer_ID = subs.Customer_ID
Where subs.Customer_Id Is Null
Barry
+5  A: 

Any mature enough SQL database should be able to execute that just as effectively as the equivalent JOIN. Use whatever is more readable to you.

Matti Virkkunen
+1 this is correct - SQL Server turns 'NOT IN' and 'NOT EXISTS' type queries into the same execution plan.
eddiegroves
+6  A: 
SELECT Customers.* 
  FROM Customers 
 WHERE NOT EXISTS (
       SELECT *
         FROM SUBSCRIBERS AS s
         JOIN s.Cust_ID = Customers.Customer_ID) 

When using “NOT IN”, the query performs nested full table scans, whereas for “NOT EXISTS”, the query can use an index within the sub-query.

Ardman
Depends on the database - SQL Server will generate the same execution plan and do index seeks (where indexes exist)
eddiegroves
A: 

One reason why you might prefer to use a JOIN rather than NOT IN is that if the Values in the NOT IN clause contain any NULLs you will always get back no results. If you do use NOT IN remember to always consider whether the sub query might bring back a NULL value!

RE: Question in Comments

'x' NOT IN (NULL,'a','b')

≡ 'x' <> NULL and 'x' <> 'a' and 'x' <> 'b'

≡ Unknown and True and True

≡ Unknown

Martin Smith
Are you saying that SELECT 'A' WHERE 'x' NOT IN (NULL,'a','b') would return an empty result?
Craig Johnston
@Craig - Yes Exactly.
Martin Smith
A: 

If you want to know which is more effective, you should try looking at the estimated query plans, or the actual query plans after execution. It'll tell you the costs of the queries (I find CPU and IO cost to be interesting). I wouldn't be surprised much if there's little to no difference, but you never know. I've seen certain queries use multiple cores on our database server, while a rewritten version of that same query would only use one core (needless to say, the query that used all 4 cores was a good 3 times faster). Never really quite put my finger on why that is, but if you're working with large result sets, such differences can occur without your knowing about it.

Rob