views:

204

answers:

5

I have a 2 gb mysql table with 500k rows and I run the following query on a system with no load.

select * from mytable 
where name in ('n1', 'n2', 'n3', 'n4', ... bunch more... ) 
order by salary

It takes a filesort and between 50 and 70 seconds to complete.

When removing the order by salary and doing the sorting in the application, the total runtime (including the sorting) cuts to about 25-30 seconds. But that's still far too much.

Any idea how I can speed this up?

Thank you.

+5  A: 

Drop the list of names into a temporary table and then do an inner join on the two tables. This way is much faster than combing that entire list for each row. Here's the pseudocode:

create temporary table names
    (name varchar(255));

insert into names values ('n1'),('n2'),...,('nn');

select
    a.*
from
    mytable a
    inner join names b on
        a.name = b.name

Also note that name should have an index on it. That makes things go much faster. Thanks to Thomas for making this note.

Eric
Make sure the original table has an index on name to make the most of the join.
Thomas Jones-Low
Thank you, this is a bit faster; about 25s for the same queries, but 25s is still not really a real improvement...
CharlesS
With or without the order by? With the order by, the query is 50% faster, without it is only slightly faster. You will want an index on both tables on just name. Then run EXPLAIN on the query to see what it is doing.
Thomas Jones-Low
Could you provide some references proving that "IN" queries are a bad choice? In my experience with Mysql (5.0+) the performance if you use a join, an equals (one name = 'xxx') or a "IN" a very similiar. I think the most important points are correct index and server configuration.
Leonel Martins
A: 
create index xyz on mytable(name(6));

"IN" queries are almost alway inefficient, as they are conceptually processed like this:

select * from mytable where name = n1  
or name = n2
or name = n3
...

The index I've given above may mean the query optimizer accesses the rows by index instead of table scan.

tpdi
Could you provide some references proving that "IN" queries are almost always inefficent? In my experience with Mysql (5.0+) they have the same performace if you use a join or an equals (one name = 'xxx'). I think the most important points are correct index and server configuration.
Leonel Martins
+1  A: 

Some ideas:

  • Do you need to be selecting *, can you get away with only selecting a subset?
  • If you can get away with selecting a subset, you could add a covering index, that is already sorted by salary
  • If everything has the same pattern you could do LIKE('n%')
Sam Saffron
+1  A: 

Try selecting the rows you want using a subquery, and then order the results of that subquery. See this question.

And you do have an index on name in mytable, right?

+1  A: 

Depending on the data distribution and the amount of rows your WHERE clause matches, you may want to try an index on (salary, name) or even (name, salary) although the latter will most probably not be very useful for that kind of queries.

You may also want to increase your sort_buffer_size setting. Test everything seperately and compare the output of EXPLAIN.

Josh Davis