I'm going through an application and trying to optimize some queries and I'm really struggling with a few of them. Here's an example:
SELECT `Item` . * , `Source` . * , `Keyword` . * , `Author` . * FROM `items` AS `Item` JOIN `sources` AS `Source` ON ( `Item`.`source_id` = `Source`.`id` ) JOIN `authors` AS `Author` ON ( `Item`.`author_id` = `Author`.`id` ) JOIN `items_keywords` AS `ItemsKeyword` ON ( `Item`.`id` = `ItemsKeyword`.`item_id` ) JOIN `keywords` AS `Keyword` ON ( `Keyword`.`id` = `ItemsKeyword`.`keyword_id` ) JOIN `keywords_profiles` AS `KeywordsProfile` ON ( `Keyword`.`id` = `KeywordsProfile`.`keyword_id` ) JOIN `profiles` AS `Profile` ON ( `Profile`.`id` = `KeywordsProfile`.`profile_id` ) WHERE `KeywordsProfile`.`profile_id` IN ( 17 ) GROUP BY `Item`.`id` ORDER BY `Item`.`timestamp` DESC , `Item`.`id` DESC LIMIT 0 , 20;
This one is taking 10-30 seconds...in the tables referenced, there are about 500k author rows, and about 750k items and items_keywords rows. Everything else is less than 500 rows.
Here's the explain output: http://img.skitch.com/20090220-fb52wd7jf58x41ikfxaws96xjn.jpg
EXPLAIN is relatively new to me, but I went through this line by line and it all seems fine. Not sure what else I can do, as I've got indexes on everything...what am I missing?
The server this sits on is just a 256 slice over at slicehost, but there's nothing else running on it and the CPU is at 0% before its run. And yet still it cranks away on this query. Any ideas?
EDIT: Some further info; one of the things that makes this really frustrating is that if I repeatedly run this query, it takes less than .1 seconds. I'm assuming this is due to the query cache, but if I run RESET QUERY CACHE before it, it still runs extremely quickly. It's only after I wait a little while or run some other queries that the 10-30 second times return. All the tables are MyISAM...does this indicate that MySQL is loading stuff into memory and that's why it runs so much faster for awhile?
EDIT 2: Thanks so much to everyone for your help...an update...I cut everything down to this:
SELECT i.id FROM items AS i ORDER BY i.timestamp DESC, i.id DESC LIMIT 0, 20;
Consistently took 5-6 seconds, despite there only being 750k records in the DB. Once I dropped the 2nd column on the ORDER BY clause, it was pretty much instant. There's obviously several things going on here, but when I cut the query down to this:
SELECT i.id FROM items AS i JOIN items_keywords AS ik ON ( i.id = ik.item_id ) JOIN keywords AS k ON ( k.id = ik.keyword_id ) JOIN keywords_profiles AS kp ON ( k.id = kp.keyword_id ) WHERE kp.profile_id IN (139) ORDER BY i.timestamp DESC LIMIT 20;
It's still taking 10+ seconds...what else can I do?
Minor curiosity: on the explain, the rows column for items_keywords is always 1544, regardless of what profile_id I'm using in the query. shouldn't it change depending on the number of items associated with that profile?
EDIT 3: Ok, this is getting ridiculous :). If I drop the ORDER BY clause entirely, things are very speedy and the temp table / filesort disappears from explain. There's currently an index on the item.timestamp column, but is it not being used for some reason? I thought I remembered something about mysql only using one index per table or something? should I create a multi-column index over all the columns on the items table that this query references (source_id, author_id, timestamp, etc)?