tags:

views:

42

answers:

3

I have an elementary query that is taking too long to execute even on small tables (<100,000 rows):

select images.classification, count(boxes.id)
from images
join boxes on images.id = boxes.image_id
where boxes.round = 0
group by images.classification;

I have indices on boxes.round, boxes.image_id, and images.classification (only varchar in this query). Primary keys on boxes.id and images.id. Explain indicates that it is taking advantage of the boxes.round index. The extra's are: Using where; Using temporary; Using filesort.

Is it possible to speed up this query? How?

If it matters, server is MySQL 5.1 all with MyISAM tables.

(This question is similar to http://stackoverflow.com/questions/1031312/how-to-speed-up-select-count-with-group-by-and-where)


Full explain output:

mysql> explain select images.classification, count(boxes.id) from images join boxes on images.id = boxes.image_id where boxes.round = 0 group by images.classification;

|  1 | SIMPLE      | boxes  | ref    | ix_boxes_image_id,ix_boxes_round | ix_boxes_round | 5       | const                       | 64162 | Using where; Using temporary; Using filesort |
|  1 | SIMPLE      | images | eq_ref | PRIMARY                                        | PRIMARY               | 4       | vatic.boxes.image_id |     1 |                                              |
A: 

Add an index to images.id.

RedFilter
images.id is a primary key.
carl
A: 

Would it be feasible to use an integer for images.classification instead of a varchar?

ford
It could be possible, but it would require a large migration for me when this query goes live.
carl
Good point, INT is usually faster than VARCHAR.
Piskvor
+2  A: 

Add an index on (images.id, images.classification) - that ought to give you

  • an index for the join. You already have that via the primary key, but you'll also get:
  • the same index usable as a covering index
  • the same index for the grouping clause

Also add an index on (boxes.image_id,boxes.round):

  • index for the JOIN
  • covering index for the COUNT (see below)

About the COUNT clause: if you don't have NULLs in boxes.id (assuming not), you can replace it with COUNT(boxes.image_id) so we can get some more use from the index from prev. paragraph.

Again, verify with EXPLAIN, but I'd say these steps should get you a better execution plan.

Piskvor
The extras: `Using where` - can't figure out all the conditions just from indexes - needs to look into the tables proper for evaluating WHERE (slower); `Using temporary` - need to create a temp result table for this (will slow down with large tables); `Using filesort` - temp table too large to fit into memory, will swap it to disk (disk access is slow)
Piskvor