tags:

views:

198

answers:

6

Hi,

When dealing with big databases, which performs better, IN or OR in the SQL Where-clause?

Is there any difference about the way they are executed?

A: 

My first guess would be that OR performs better, unless the SQL engine converts IN into OR behind the scene. Have you seen the query plan of these two?

Raj
A: 

OR makes sense (from readability point of view), when there are less values to be compared. IN is useful esp. when you have a dynamic source, with which you want values to be compared.

Another alternative is to use a JOIN with a temporary table.
I don't think performance should be a problem, provided you have necessary indexes.

shahkalpesh
+7  A: 

The best way to find out is looking at the Execution Plan.


I tried it with Oracle, and it was exactly the same.

CREATE TABLE performance_test AS ( SELECT * FROM dba_objects );

SELECT * FROM performance_test
WHERE object_name IN ('DBMS_STANDARD', 'DBMS_REGISTRY', 'DBMS_LOB' );

Even though the query uses IN, the Execution Plan says that it uses OR:

--------------------------------------------------------------------------------------    
| Id  | Operation         | Name             | Rows  | Bytes | Cost (%CPU)| Time     |    
--------------------------------------------------------------------------------------    
|   0 | SELECT STATEMENT  |                  |     8 |  1416 |   163   (2)| 00:00:02 |    
|*  1 |  TABLE ACCESS FULL| PERFORMANCE_TEST |     8 |  1416 |   163   (2)| 00:00:02 |    
--------------------------------------------------------------------------------------    

Predicate Information (identified by operation id):                                       
---------------------------------------------------                                       

   1 - filter("OBJECT_NAME"='DBMS_LOB' OR "OBJECT_NAME"='DBMS_REGISTRY' OR                
              "OBJECT_NAME"='DBMS_STANDARD')                                              
Peter Lang
What happens in Oracle if you have more than 3 values that you are testing? Do you know if Oracle is unable to perform the same binary search optimization as MySQL or does it perform it in both cases?
Mark Byers
@Mark Byers: I tried the same query with 10 values, still the same result. Note, that the optimizer resorted my values in alphabetical order. I would not be surprised if Oracle did some internal optimization of that filter...
Peter Lang
Oracle also has an `INLIST ITERATOR` operation, which it would select if there were an index it could use. Still, when I tried it out, both `IN` and `OR` end up with the same execution plan.
Cheran S
+11  A: 

I assume you want to know the performance difference between the following:

WHERE foo IN ('a', 'b', 'c')
WHERE foo = 'a' OR foo = 'b' OR foo = 'c'

According to the manual for MySQL if the values are constant IN sorts the list and then uses a binary search. I would imagine that OR evaluates them one by one in no particular order. So IN is faster in some circumstances.

The best way to know is to profile both on your database with your specific data to see which is faster.

I tried both on a MySQL with 1000000 rows. When the column is indexed there is no discernable difference in performance - both are nearly instant. When the column is not indexed I got these results:

SELECT COUNT(*) FROM t_inner WHERE val IN (1000, 2000, 3000, 4000, 5000, 6000, 7000, 8000, 9000);
1 row fetched in 0.0032 (1.2679 seconds)

SELECT COUNT(*) FROM t_inner WHERE val = 1000 OR val = 2000 OR val = 3000 OR val = 4000 OR val = 5000 OR val = 6000 OR val = 7000 OR val = 8000 OR val = 9000;
1 row fetched in 0.0026 (1.7385 seconds)

So in this case the method using OR is about 30% slower. Adding more terms makes the difference larger. Results may vary on other databases and on other data.

Mark Byers
If the optimizer is worth its salt they should perform the same.
inflagranti
@inflagranti: No optimizer is perfect unfortunately. Optimizers are extremely complex programs and each implementation will have its own strengths and weaknesses. This is why I say you should profile on a specific implementation. I'd imagine that the extra structure of the `IN` method makes it easier to optimize than a whole bunch of possibly related `OR` clauses. I'd be surprised if there is an engine where the `OR` method is faster, but I'm not surprised that there are times when OR is slower.
Mark Byers
+3  A: 

I think oracle is smart enough to convert the less efficient one (whichever that is) into the other. So I think the answer should rather depend on the readability of each (where I think that IN clearly wins)

soulmerge
A: 

The OR operator needs a much more complex evaluation process than the IN construct because it allows many conditions, not only equals like IN.

Here is a like of what you can use with OR but that are not compatible with IN: greater. greater or equal, less, less or equal, LIKE and some more like the oracle REGEXP_LIKE. In addition consider that the conditions may not always compare the same value.

For the query optimizer it's easier to to manage the IN operator because is only a construct that defines the OR operator on multiple conditions with = operator on the same value. If you use the OR operator the optimizer may not consider that you're always using the = operator on the same value and, if it doesn't perform a deeper and very much more complex elaboration, it could probably exclude that there may be only = operators for the same values on all the involved conditions, with a consequent preclusion of optimized search methods like the already mentioned binary search.

[EDIT] Probably an optimizer may not implement optimized IN evaluation process, but this doesn't exclude that one time it could happen(with a database version upgrade). So if you use the OR operator that optimized elaboration will not be used in your case.

Alessandro Rossi