views:

61

answers:

3

Ok, so I have a query:

select distinct(a)
from mytable
where
b in (0,3)

What is going to be faster, the above or

select distinct(a)
from mytable
where
b = 0
or
b = 3

Is there a general rule?

Thanks

+5  A: 

As far as I know, IN converts to OR. So the performance is the same. Just a shorter way of writing it.

Raze2dust
+5  A: 

Hopefully in this simple example it won't make any difference which version you use (as the query optimiser should turn them into equivalent queries under the hood), however there's a fair chance it's going to be dependent on the indexes you have on mytable. I would suggest that you run both queries in Sql Server Management Studio after having turned on "Include Actual Execution Plan", and compare the results to determine which query has the lowest "cost" in your scenario.

To do this:

  1. Put your query(s) into a new Sql Sever Management Studio query window
  2. Right click on the window in the space you've typed into
  3. Click "Include Actual Execution Plan"
  4. Run your query as you would usually

The bottom "results" half of the window will now have a 3rd tab showing, "Execution Plan" which should contain two "flowcharts", one for the first query and another for the second. If the two are identical, then Sql Server has treated the two queries as equivalent and therefore you should choose whichever form you and/or your colleagues prefer.

Rob
+1. FWIW, I did this and the plans are indeed identical.
harpo
@harpo, yeup, it was my long winded way of saying "here's how to work it out yourself, as I know it'll be the same but you'll get more out of discovering that" to the OP =)
Rob
+2  A: 

Both IN and OR will do a query for b = 0 followed by one for b = 3, and then do a merge join on the two result sets, and finally filter out any duplicates.

With IN, duplicates doesn't really make sense, because b can't both be 0 and 3, but the fact is that IN will be converted to b = 0 OR b = 3, and with OR, duplicates do make sense, because you could have b = 0 OR a = 3, and if you were to join the two separate result sets, you could end up with duplicates for each record that matched both criteria.

So a duplicate filtering will always be done, regardless of whether you're using IN or OR. However, if you know from the outset that you will not have any duplicates - which is usually the case when you're using IN - then you can gain some performance by using UNION ALL which doesn't filter out duplicates:

select distinct(a)
from mytable
where
b = 0

UNION ALL

select distinct(a)
from mytable
where
b = 3
David Hedlund