Mysql performance with nested indices

I have a mysql table (articles) with a nested index (blog_id, published), and performs poorly. I see a lot of these in my slow query logs:

- Query_time: 23.184007 Lock_time: 0.000063 Rows_sent: 380 Rows_examined: 6341 SELECT id from articles WHERE category_id = 11 AND blog_id IN (13,14,15,16,17,18,19,20,21,22,23,24,26,27,6330,6331,8269,12218,18889) order by published DESC LIMIT 380;

I have trouble understanding why mysql would run through all rows with those blog_ids to figure out my top 380 rows. I would expect the whole purpose of the nested index is to speed that up. To the very least, even a naive implementation, should look-up by blog_id and get it's top 380 rows ordered by published. That should be fast, since, we can figure out the exact 200 rows, due to the nested index. And then sort the resulting 19*200=3800 rows.

If one were to implement it in the most optimal way, you would put a heap from the set of all blog-id based streams and pick the one with the max(published) and repeat it 200 times. Each operation should be fast.

I'm surely missing something since Google, Facebook, Twitter, Microsoft and all the big companies are using mysql for production purposes. Any one with experience?

Edit: Updating as per, thieger's answer. I tried index hinting, and it doesn't seem to help. Results are attached below, at the end. Mysql order by optimisation claims to address the concern theiger is raising:

I agree that MySQL might possibly use the composite blog_id-published-index, but only for the blog_id part of the query.

SELECT * FROM t1 WHERE key_part1=constant ORDER BY key_part2;

Atleast mysql seems to claim it can be used beyond just the WHERE clause (blog_id part of the query). Any help theiger?

Thanks, -Prasanna [myprasanna at gmail dot com]

CREATE TABLE IF NOT EXISTS `articles` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `category_id` int(11) DEFAULT NULL,
  `blog_id` int(11) DEFAULT NULL,
  `cluster_id` int(11) DEFAULT NULL,
  `title` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
  `description` text COLLATE utf8_unicode_ci,
  `keywords` text COLLATE utf8_unicode_ci,
  `image_url` varchar(511) COLLATE utf8_unicode_ci DEFAULT NULL,
  `url` varchar(511) COLLATE utf8_unicode_ci DEFAULT NULL,
  `url_hash` varchar(50) COLLATE utf8_unicode_ci DEFAULT NULL,
  `author` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
  `categories` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
  `published` int(11) DEFAULT NULL,
  `created_at` datetime DEFAULT NULL,
  `updated_at` datetime DEFAULT NULL,
  `is_image_crawled` tinyint(1) DEFAULT NULL,
  `image_candidates` text COLLATE utf8_unicode_ci,
  `title_hash` varchar(50) COLLATE utf8_unicode_ci DEFAULT NULL,
  `article_readability_crawled` tinyint(1) DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `index_articles_on_url_hash` (`url_hash`),
  KEY `index_articles_on_cluster_id` (`cluster_id`),
  KEY `index_articles_on_published` (`published`),
  KEY `index_articles_on_is_image_crawled` (`is_image_crawled`),
  KEY `index_articles_on_category_id` (`category_id`),
  KEY `index_articles_on_title_hash` (`title_hash`),
  KEY `index_articles_on_article_readability_crawled` (`article_readability_crawled`),
  KEY `index_articles_on_blog_id` (`blog_id`,`published`)
) ENGINE=InnoDB  DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci AUTO_INCREMENT=562907 ;

SELECT id from articles USE INDEX(index_articles_on_blog_id) WHERE category_id = 11 AND blog_id IN (13,14,15,16,17,18,19,20,21,22,23,24,26,27,6330,6331,8269,12218,18889) order by published DESC LIMIT 380;

....
380 rows in set (11.27 sec)

explain SELECT id from articles USE INDEX(index_articles_on_blog_id) WHERE category_id = 11 AND blog_id IN (13,14,15,16,17,18,19,20,21,22,23,24,26,27,6330,6331,8269,12218,18889) order by published DESC LIMIT 380\G;
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: articles
         type: range
possible_keys: index_articles_on_blog_id
          key: index_articles_on_blog_id
      key_len: 5
          ref: NULL
         rows: 8640
        Extra: Using where; Using filesort
1 row in set (0.00 sec)

Also I forgot to mention that, blog_id has a unique association with cateogory_id, and the category_id = xxx part of the query can be removed. So it seems to not make sense to include category_id in any indexing.

Prasanna 2010-08-03 23:13:54

Also updated the question with edits. Please take a look. Thanks for the response.

Prasanna 2010-08-03 23:14:17

As to the "unique association with category_id", I'm not sure what you mean, but if MySQL doesn't know about it, it doesn't matter anyway. As to order by using the index, think again: if the index is ordered by blog_id, and then by published, and you ask MySQL to select ranges of records with several blog_ids, then the result cannot be already ordered by published, so it has to be sorted again. But I'm also puzzled by your EXPLAIN output, with MySQL claiming that it uses the index but still considers all records -- or are there more than 8000 records? In the first output it were only 6000.

thieger 2010-08-03 23:47:10

Just to see the extreme case of this, If I do LIMIT 1, would mysql fetch all the thousands of rows and sort them? when you intersect with blog_id, the extra information you have is the ordering of published. But seems like mysql is not doing that. Anyways, I'll mark this question as answered. Thnx, Cheers.

Prasanna 2010-08-04 00:23:55

Prasanna 2010-08-03 23:21:20

So, what does the query do without the category_id? And how is your innodb key status?

Wrikken 2010-08-03 23:31:06

As can be seen from Prasanna's edit, MySQL in fact uses the index. (And apart from him saying that MySQL examines all rows -- about the 6000 -- we don't know how many rows the table has). And, as Prasanna has correctly pointed out, there are cases where an index can be used for both the where and the order by part. It just seems this is not such a query, probably because "in (...)" is not a constant in the sense required here.

thieger 2010-08-04 00:45:40

Ah... yes, you're right: it is using the index. I also misread the 'rows' column in the Explain. And I actually agreed with Prasanna's link about the index. As his query stands, MySQL won't use the index in the `ORDER BY` clause. I may have not been as clear as possible on this. IME, most people with this sort of problem take a while to realize the `ORDER BY` needs to reference the same columns as in the `WHERE` clause for it to use the index for sorting.

staticsan 2010-08-04 04:02:01

ansaurus

tags:

views:

answers:

Mysql performance with nested indices

related questions