views:

42

answers:

2

I'm doing a quite complicated find with lots of includes, which rails is splitting into a sequence of discrete queries rather than do a single big join. The queries are really slow - my dataset isn't massive, with none of the tables having more than a few thousand records.

I have indexed all of the fields which are examined in the queries but i'm worried that the indexes aren't helping for some reason: i installed a plugin called "query_reviewer" which looks at the queries used to build a page, and lists problems with them. This states that indexes AREN'T being used, and it features the results of calling 'explain' on the query, which lists various problems. Here's an example find call:

Question.paginate(:all, {:page=>1, :include=>[:answers, :quizzes, :subject, {:taggings=>:tag}, {:gradings=>[:age_group, :difficulty]}], :conditions=>["((questions.subject_id = ?) or (questions.subject_id = ? and tags.name = ?))", "1", 19, "English"], :order=>"subjects.name, (gradings.difficulty_id is null), gradings.age_group_id, gradings.difficulty_id", :per_page=>30})

And here are the generated sql queries:

SELECT DISTINCT `questions`.id 
  FROM `questions` 
  LEFT OUTER JOIN `taggings` ON `taggings`.taggable_id = `questions`.id 
    AND `taggings`.taggable_type = 'Question' 
  LEFT OUTER JOIN `tags` ON `tags`.id = `taggings`.tag_id 
  LEFT OUTER JOIN `subjects` ON `subjects`.id = `questions`.subject_id 
  LEFT OUTER JOIN `gradings` ON gradings.question_id = questions.id 
  WHERE (((questions.subject_id = '1') or (questions.subject_id = 19 and tags.name = 'English'))) 
  ORDER BY subjects.name, (gradings.difficulty_id is null), gradings.age_group_id, gradings.difficulty_id 
  LIMIT 0, 30

SELECT `questions`.`id` AS t0_r0 <..etc...> 
  FROM `questions` 
  LEFT OUTER JOIN `answers` ON answers.question_id = questions.id 
  LEFT OUTER JOIN `quiz_questions` ON (`questions`.`id` = `quiz_questions`.`question_id`) 
  LEFT OUTER JOIN `quizzes` ON (`quizzes`.`id` = `quiz_questions`.`quiz_id`) 
  LEFT OUTER JOIN `subjects` ON `subjects`.id = `questions`.subject_id 
  LEFT OUTER JOIN `taggings` ON `taggings`.taggable_id = `questions`.id 
    AND `taggings`.taggable_type = 'Question' 
  LEFT OUTER JOIN `tags` ON `tags`.id = `taggings`.tag_id 
  LEFT OUTER JOIN `gradings` ON gradings.question_id = questions.id 
  LEFT OUTER JOIN `age_groups` ON `age_groups`.id = `gradings`.age_group_id 
  LEFT OUTER JOIN `difficulties` ON `difficulties`.id = `gradings`.difficulty_id 
  WHERE (((questions.subject_id = '1') or (questions.subject_id = 19 and tags.name = 'English'))) 
    AND `questions`.id IN (602, 634, 666, 698, 730, 762, 613, 645, 677, 709, 741, 592, 624, 656, 688, 720, 752, 603, 635, 667, 699, 731, 763, 614, 646, 678, 710, 742, 593, 625) 
  ORDER BY subjects.name, (gradings.difficulty_id is null), gradings.age_group_id, gradings.difficulty_id

SELECT count(DISTINCT `questions`.id) AS count_all FROM `questions` 
  LEFT OUTER JOIN `answers` ON answers.question_id = questions.id 
  LEFT OUTER JOIN `quiz_questions` ON (`questions`.`id` = `quiz_questions`.`question_id`) 
  LEFT OUTER JOIN `quizzes` ON (`quizzes`.`id` = `quiz_questions`.`quiz_id`) 
  LEFT OUTER JOIN `subjects` ON `subjects`.id = `questions`.subject_id 
  LEFT OUTER JOIN `taggings` ON `taggings`.taggable_id = `questions`.id 
    AND `taggings`.taggable_type = 'Question' 
  LEFT OUTER JOIN `tags` ON `tags`.id = `taggings`.tag_id 
  LEFT OUTER JOIN `gradings` ON gradings.question_id = questions.id 
  LEFT OUTER JOIN `age_groups` ON `age_groups`.id = `gradings`.age_group_id 
  LEFT OUTER JOIN `difficulties` ON `difficulties`.id = `gradings`.difficulty_id 
  WHERE (((questions.subject_id = '1') or (questions.subject_id = 19 and tags.name = 'English')))

Actually, looking at these all nicely formatted here, there's a crazy amount of joining going on here. This can't be optimal surely. Anyway, it looks like i have two questions.

1) I have an index on each of the ids and foreign key fields referred to here. The second of the above queries is the slowest, and calling explain on it (doing it directly in mysql) gives me the following:

+----+-------------+----------------+--------+---------------------------------------------------------------------------------+-------------------------------------------------+---------+------------------------------------------------+------+----------------------------------------------+
| id | select_type | table          | type   | possible_keys                                                                   | key                                             | key_len | ref                                            | rows | Extra                                        |
+----+-------------+----------------+--------+---------------------------------------------------------------------------------+-------------------------------------------------+---------+------------------------------------------------+------+----------------------------------------------+
|  1 | SIMPLE      | questions      | range  | PRIMARY,index_questions_on_subject_id                                           | PRIMARY                                         | 4       | NULL                                           |   30 | Using where; Using temporary; Using filesort | 
|  1 | SIMPLE      | answers        | ref    | index_answers_on_question_id                                                    | index_answers_on_question_id                    | 5       | millionaire_development.questions.id           |    2 |                                              | 
|  1 | SIMPLE      | quiz_questions | ref    | index_quiz_questions_on_question_id                                             | index_quiz_questions_on_question_id             | 5       | millionaire_development.questions.id           |    1 |                                              | 
|  1 | SIMPLE      | quizzes        | eq_ref | PRIMARY                                                                         | PRIMARY                                         | 4       | millionaire_development.quiz_questions.quiz_id |    1 |                                              | 
|  1 | SIMPLE      | subjects       | eq_ref | PRIMARY                                                                         | PRIMARY                                         | 4       | millionaire_development.questions.subject_id   |    1 |                                              | 
|  1 | SIMPLE      | taggings       | ref    | index_taggings_on_taggable_id_and_taggable_type,index_taggings_on_taggable_type | index_taggings_on_taggable_id_and_taggable_type | 263     | millionaire_development.questions.id,const     |    1 |                                              | 
|  1 | SIMPLE      | tags           | eq_ref | PRIMARY                                                                         | PRIMARY                                         | 4       | millionaire_development.taggings.tag_id        |    1 | Using where                                  | 
|  1 | SIMPLE      | gradings       | ref    | index_gradings_on_question_id                                                   | index_gradings_on_question_id                   | 5       | millionaire_development.questions.id           |    2 |                                              | 
|  1 | SIMPLE      | age_groups     | eq_ref | PRIMARY                                                                         | PRIMARY                                         | 4       | millionaire_development.gradings.age_group_id  |    1 |                                              | 
|  1 | SIMPLE      | difficulties   | eq_ref | PRIMARY                                                                         | PRIMARY                                         | 4       | millionaire_development.gradings.difficulty_id |    1 |                                              | 
+----+-------------+----------------+--------+---------------------------------------------------------------------------------+-------------------------------------------------+---------+------------------------------------------------+------+----------------------------------------------+

The query_reviewer plugin has this to say about it - it lists several problems:

Table questions:  Using temporary table, Long key length (263), Using filesort 
MySQL must do an extra pass to find out how to retrieve the rows in sorted order.
To resolve the query, MySQL needs to create a temporary table to hold the result.
The key used for the index was rather long, potentially affecting indices in memory

2) It looks like rails isn't splitting this find up in a very optimal way. Is it, do you think? Am i better off doing several find queries manually rather than one big combined one?

Grateful for any advice, max

+1  A: 

Usually ActiveRecord preloads associations with separate queries, as it is usually faster. However, when it notices, that you have used included associations in your :conditions or :order, it performs one big query, including all tables, not only needed ones.

What you can probably do, is to include only those tables, which are used in conditions and preload all other associations afterwards:

questions = Question.paginate(:all, {:page=>1, :include => [:subject, {:taggings=>:tag}, :gradings], :conditions=>["((questions.subject_id = ?) or (questions.subject_id = ? and tags.name = ?))", "1", 19, "English"], :order=>"subjects.name, (gradings.difficulty_id is null), gradings.age_group_id, gradings.difficulty_id", :per_page=>30})

Question.send(:preload_associations, questions, [:answers, :quizzes, {:gradings=>[:age_group, :difficulty]}])

First query will run on subjects, taggings, tags, gradings and questions tables, as they are used in coditions/order. And :answers, :quizzes, age_group and :difficulty will be 4 separate simple queries.

And then you can try and optimize more your indexes etc.

Voyta
A: 

Output from query_reviewer says that mysql has to your query has to be called twice because of some order_by problem. You can test if it is cousing your problems, by simply removing order part from your call. If it is the problem, then it should run much faster.

Your query looks very complicated. Are you sure you need to load such complicated data? If you don't access most of the fields in associated models, than it would be better not to include them.

klew