views:

936

answers:

5

I'm using Hibernate's JPA implementation with MySQL 5.0.67. MySQL is configured to use InnoDB.

In performing a JPA query (which is translated to SQL), I've discovered that using the IN clause is slower than performing individual queries. Example:

SELECT p FROM Person p WHERE p.name IN ('Joe', 'Jane', 'Bob', 'Alice')

is slower than four separate queries:

SELECT p FROM Person p WHERE p.name = 'Joe'
SELECT p FROM Person p WHERE p.name = 'Jane'
SELECT p FROM Person p WHERE p.name = 'Bob'
SELECT p FROM Person p WHERE p.name = 'Alice'

Why is this? Is this a MySQL performance limitation?

+1  A: 

If you're using the IN operator, it's not much different than saying:

(p.name = 'Joe' OR p.name = 'Jane' OR p.name = 'Bob' OR p.name = 'Alice')

Those are four conditions which must be checked for every row that the query must consider. Of course, each other query you cite has only one condition. I don't believe in most real-world scenarios doing four such queries would be faster, since you have to consider the time it takes for your client to read the result sets and do something with them. In that case, IN looks pretty nice; even better if it can use an index.

Peter
A: 

Are you measuring wall-clock time or query execution time? My guess is that the actual execution time for each of the four individual queries may add up to less than the time to execute the IN query, but the overall wall-clock time will be much longer for the four queries.

It will help to have an index on the name column.

Barry Brown
I am measuring total elapsed time.
Steve Kuo
+6  A: 

This is a known deficiency in MySQL.

It is often true that using UNION performs better than a range query like the one you show. MySQL doesn't employ indexes very intelligently for expressions using IN (...). A similar hole exists in the optimizer for boolean expressions with OR.

See http://www.mysqlperformanceblog.com/2006/08/10/using-union-to-implement-loose-index-scan-to-mysql/ for some explanation and detailed benchmarks.

The optimizer is being improved all the time. A deficiency in one version of MySQL may be improved in a subsequent version. So it's worth testing your queries on different versions.

It is also advantageous to use UNION ALL instead of simply UNION. Both queries use a temporary table to store results, but the difference is that UNION applies DISTINCT to the result set, which incurs an additional un-indexed sort.

Bill Karwin
Does SQL Server have this deficiency?
Steve Kuo
I'm not a SQL Server user, so I don't know the answer. Best thing to do is try both query forms in Query Analyzer, and use the Show Execution Plan menu option.
Bill Karwin
No, SQL Server does not have this problem
Matt Rogish
+1  A: 

A query as simple as the IN demonstrated shouldn't have an issue with the optimizer choosing to use the index. The UNION work mentioned by Bill is only required occasionally when you have more complex queries. It could be an issue with index statistics.

Have you done an ANALYZE on the table in question?

How many rows are in the table and how many match the IN clause?

What does EXPLAIN say for the queries in question?

Harrison Fisk
A: 

For me because the IN clause can free the database and tables up to be used by other connections, and there are application structure benefit to using it, the IN clause is an invaluable tool, even if there is a slight lag over individual queries.

The following technique is utilized in almost every PHP/MySQL application I construct.

I use the IN clause quite a bit with numerical keys:

e.g.

grab five master items and all subites could be:

$master_arr = mysql_query(
 select * from master table where master_id in (1,7,9,10)
);

then:

$subitem_arr = mysql_query(
   select * from subitems table where par_master_id in (1,7,9,10)
);

the add the subarray to the master items:

foreach($subitem_arr AS $sv){
     $m_key = $sv['par_master_id'];
     $s_key = $sv['subitem_id'];
     $master_arr[$m_key]['subitem'][$s_key] = $sv;
}

This does two things: 1.) the tables are not all held at once with a join 2.) only two mysql queries produce a tree of data

Fire Crow