I have a mysql table (articles) with a nested index (blog_id, published), and performs poorly. I see a lot of these in my slow query logs:
- Query_time: 23.184007 Lock_time: 0.000063 Rows_sent: 380 Rows_examined: 6341
SELECT id from articles WHERE category_id = 11 AND blog_id IN (13,14,15,16,17,18,19,20,21,22,23,24,26,27,6330,6331,8269,12218,18889) order by published DESC LIMIT 380;
I have trouble understanding why mysql would run through all rows with those blog_ids to figure out my top 380 rows. I would expect the whole purpose of the nested index is to speed that up. To the very least, even a naive implementation, should look-up by blog_id and get it's top 380 rows ordered by published. That should be fast, since, we can figure out the exact 200 rows, due to the nested index. And then sort the resulting 19*200=3800 rows.
If one were to implement it in the most optimal way, you would put a heap from the set of all blog-id based streams and pick the one with the max(published) and repeat it 200 times. Each operation should be fast.
I'm surely missing something since Google, Facebook, Twitter, Microsoft and all the big companies are using mysql for production purposes. Any one with experience?
Edit: Updating as per, thieger's answer. I tried index hinting, and it doesn't seem to help. Results are attached below, at the end. Mysql order by optimisation claims to address the concern theiger is raising:
I agree that MySQL might possibly use the composite blog_id-published-index, but only for the blog_id part of the query.
SELECT * FROM t1 WHERE key_part1=constant ORDER BY key_part2;
Atleast mysql seems to claim it can be used beyond just the WHERE clause (blog_id part of the query). Any help theiger?
Thanks, -Prasanna [myprasanna at gmail dot com]
CREATE TABLE IF NOT EXISTS `articles` ( `id` int(11) NOT NULL AUTO_INCREMENT, `category_id` int(11) DEFAULT NULL, `blog_id` int(11) DEFAULT NULL, `cluster_id` int(11) DEFAULT NULL, `title` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL, `description` text COLLATE utf8_unicode_ci, `keywords` text COLLATE utf8_unicode_ci, `image_url` varchar(511) COLLATE utf8_unicode_ci DEFAULT NULL, `url` varchar(511) COLLATE utf8_unicode_ci DEFAULT NULL, `url_hash` varchar(50) COLLATE utf8_unicode_ci DEFAULT NULL, `author` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL, `categories` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL, `published` int(11) DEFAULT NULL, `created_at` datetime DEFAULT NULL, `updated_at` datetime DEFAULT NULL, `is_image_crawled` tinyint(1) DEFAULT NULL, `image_candidates` text COLLATE utf8_unicode_ci, `title_hash` varchar(50) COLLATE utf8_unicode_ci DEFAULT NULL, `article_readability_crawled` tinyint(1) DEFAULT NULL, PRIMARY KEY (`id`), KEY `index_articles_on_url_hash` (`url_hash`), KEY `index_articles_on_cluster_id` (`cluster_id`), KEY `index_articles_on_published` (`published`), KEY `index_articles_on_is_image_crawled` (`is_image_crawled`), KEY `index_articles_on_category_id` (`category_id`), KEY `index_articles_on_title_hash` (`title_hash`), KEY `index_articles_on_article_readability_crawled` (`article_readability_crawled`), KEY `index_articles_on_blog_id` (`blog_id`,`published`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci AUTO_INCREMENT=562907 ;
SELECT id from articles USE INDEX(index_articles_on_blog_id) WHERE category_id = 11 AND blog_id IN (13,14,15,16,17,18,19,20,21,22,23,24,26,27,6330,6331,8269,12218,18889) order by published DESC LIMIT 380; .... 380 rows in set (11.27 sec) explain SELECT id from articles USE INDEX(index_articles_on_blog_id) WHERE category_id = 11 AND blog_id IN (13,14,15,16,17,18,19,20,21,22,23,24,26,27,6330,6331,8269,12218,18889) order by published DESC LIMIT 380\G; *************************** 1. row *************************** id: 1 select_type: SIMPLE table: articles type: range possible_keys: index_articles_on_blog_id key: index_articles_on_blog_id key_len: 5 ref: NULL rows: 8640 Extra: Using where; Using filesort 1 row in set (0.00 sec)