ansaurus

Question

How to optimize a query that's using group by on a large number of rows

Answer 1

A:

You're not using the index tweet_tweet_entity_created. Change your query to:

explain SELECT `tweet_tweet`.`entity_id`, 
       STDDEV_POP(`tweet_tweet`.`positive_sentiment`) AS `sentiment_stddev`, 
       AVG(`tweet_tweet`.`positive_sentiment`) AS `sentiment_avg`, 
       COUNT(`tweet_tweet`.`id`) AS `tweet_count` 
       FROM `tweet_tweet` FORCE INDEX (tweet_tweet_entity_created)
       WHERE `tweet_tweet`.`created_at` > '2010-10-06 16:24:43'  
       GROUP BY `tweet_tweet`.`entity_id` ORDER BY `tweet_tweet`.`entity_id` ASC;

You can read more about index hints in the MySQL manual http://dev.mysql.com/doc/refman/5.1/en/index-hints.html

Sometimes MySQL's query optimizer needs a little help.

William 2010-10-06 17:50:07

Answer 2

A:

MySQL has a dirty little secret. When you create an index over multiple columns, only the first one is really "used". I've made tables that used Unique Keys and Foreign Keys, and I often had to set a separate index for one or more of the columns.

I suggest adding an extra index to just created_at at a minimum. I do not know if adding indexes to the aggregate columns will also speed things up.

Wolfman2000 2010-10-06 17:50:21

Sometimes you need to let the query optimizer know what index to use.

William 2010-10-06 17:52:40

@Wolfman2000: This is like saying that in a telephone book, only people's last names are used. Of course this is not true. First names are in there too, but the book is ordered is by last name first. So if you want to search for everyone with a given first name, the index doesn't help.

Bill Karwin 2010-10-06 17:59:01

Guess I misunderstood the question. Sorry about that.

Wolfman2000 2010-10-08 19:26:46

Answer 3

A:

You may try to reorder fields in the index (i.e. KEY tweet_tweet_entity_created (created_at, entity_id). That will allow mysql to use the index to reduce the quantity of actual rows that need to be grouped and ordered).

cryo28 2010-10-06 17:50:33

Wolfmann2000 in his answer said that mysql uses only the first key of the multicolumn index. Actually it is not really correct. It is up to the structure of B-Tree indices. If it makes sense for a particular query to use both keys from the multicolumn index, mysql will use both. If it is not - it will use only the first one. However, you can't use the second column of a multicolumn index from the query which doesn't need the first one. So when you create multicolumn indices you should always think about the order of columns in a multicolumn index you are going to create.

cryo28 2010-10-06 17:59:54

That did it! The query now takes 0.73 seconds!

2010-10-06 18:19:29

Answer 4

A:

if your mysql version 5.1 or higher ,you can consider partitioning option for large tables.

http://dev.mysql.com/doc/refman/5.1/en/partitioning.html

Baris Akverdi 2010-10-13 13:46:53

ansaurus

tags:

views:

answers:

How to optimize a query that's using group by on a large number of rows

related questions