views:

264

answers:

3

Recently, we've noticed a particular query popping up in our slow query logs taking quite some time. I've analyzed it to the best of my ability, but can't figure out for the life of me why it's taking so long, and why the indexes we've set up aren't being used.

Here's a simplified (i.e., readable) version of the query for the purpose of example:

SELECT processstage.id AS processstage_id, 
  processstage.job_id AS processstage_job_id, 
  processstage.event_id AS processstage_event_id, ...
FROM processstage INNER JOIN jobevent ON jobevent.id = processstage.event_id 
WHERE processstage.due_date <= '2009-10-28 16:07:59' AND (EXISTS (
  SELECT 1 FROM job 
  WHERE jobevent.job_id = job.id AND job.brand_id = 1
)) ORDER BY processstage.due_date;

Also, for good measure, it's important to note the size of the table, processstage:

mysql> SELECT COUNT(id) FROM processstage;
+-----------+
| COUNT(id) |
+-----------+
|    596183 | 
+-----------+

When I run EXPLAIN on the query, I find that the processstage table is reading over a huge number of rows (see "Using where; Using filesort"), as no index is being used (that I can tell):

mysql> EXPLAIN SELECT processstage.id AS processstage_id, processstage.job_id AS processstage_job_id, processstage.event_id AS processstage_event_id     FROM processstage INNER JOIN jobevent ON jobevent.id = processstage.event_id      WHERE processstage.due_date <= '2009-10-28 16:07:59' AND (EXISTS (SELECT 1      FROM job      WHERE jobevent.job_id = job.id AND job.brand_id = 1)) ORDER BY processstage.due_date;
+----+--------------------+--------------+--------+---------------------------------------------------+---------+---------+------------------------------+--------+-----------------------------+
| id | select_type        | table        | type   | possible_keys                                     | key     | key_len | ref                          | rows   | Extra                       |
+----+--------------------+--------------+--------+---------------------------------------------------+---------+---------+------------------------------+--------+-----------------------------+
|  1 | PRIMARY            | processstage | ALL    | ix_processstage_due_date,processstage_event_id_fk | NULL    | NULL    | NULL                         | 606045 | Using where; Using filesort | 
|  1 | PRIMARY            | jobevent     | eq_ref | PRIMARY                                           | PRIMARY | 4       | processstage.event_id |      1 | Using where                 | 
|  2 | DEPENDENT SUBQUERY | job          | eq_ref | PRIMARY,ix_job_brand_id                           | PRIMARY | 4       | jobevent.job_id       |      1 | Using where                 | 
+----+--------------------+--------------+--------+---------------------------------------------------+---------+---------+------------------------------+--------+-----------------------------+
3 rows in set (0.00 sec)

Oddly enough, we've got indexes on the columns utilized in the query's WHERE clause:

mysql> SHOW INDEXES FROM processstage;
+--------------+------------+----------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| Table        | Non_unique | Key_name                   | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+--------------+------------+----------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| processstage |          0 | PRIMARY                    |            1 | id          | A         |      614150 |     NULL | NULL   |      | BTREE      |         | 
| processstage |          1 | ix_processstage_job_id     |            1 | job_id      | A         |       47242 |     NULL | NULL   | YES  | BTREE      |         | 
| processstage |          1 | ix_processstage_stop_date  |            1 | stop_date   | A         |      614150 |     NULL | NULL   | YES  | BTREE      |         | 
| processstage |          1 | ix_processstage_order      |            1 | order       | A         |          16 |     NULL | NULL   | YES  | BTREE      |         | 
| processstage |          1 | ix_processstage_start_date |            1 | start_date  | A         |      122830 |     NULL | NULL   | YES  | BTREE      |         | 
| processstage |          1 | ix_processstage_milestone  |            1 | milestone   | A         |       12794 |     NULL | NULL   | YES  | BTREE      |         | 
| processstage |          1 | ix_processstage_due_date   |            1 | due_date    | A         |       51179 |     NULL | NULL   | YES  | BTREE      |         | 
| processstage |          1 | ix_processstage_process_id |            1 | process_id  | A         |       76768 |     NULL | NULL   | YES  | BTREE      |         | 
| processstage |          1 | processstage_event_id_fk   |            1 | event_id    | A         |        3722 |     NULL | NULL   | YES  | BTREE      |         | 
+--------------+------------+----------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+

Any ideas as to why our indexes are going seemingly unused? Perhaps I'm missing something very obvious or am approaching this entirely the wrong way. Any input or suggestion is very much appreciated!

+1  A: 

How many jobevents do you get where job.brand = 1? If it's a reasonable amount you could try the following queries to alleviate some joins and the sub query.

First Query:

select distinct jobevent.id from jobevent
inner join job on job.id = jobevent.job_id
where job.brand = 1

followed by

select processstage.id as processstage_id, 
  processstage.job_id as processstage_job_id, 
  processstage.event_id as processstage_event_id, ...
from processstage 
where processstage.due_date <= '2009-10-28 16:07:59' and 
processstage.event_id in (list of event ids from the previous query) 
order by processstage.due_date;

Previous Thoughts:

Have you tried doing a regular join to jobs instead of the dependent sub query? Something like:

SELECT processstage.id AS processstage_id, 
  processstage.job_id AS processstage_job_id, 
  processstage.event_id AS processstage_event_id, ...
FROM processstage 
INNER JOIN jobevent ON jobevent.id = processstage.event_id 
INNER JOIN job ON job.id = jobevent.job_id 
WHERE processstage.due_date <= '2009-10-28 16:07:59' AND 
job.brand = 1 
ORDER BY processstage.due_date;

Do you have job events that don't have jobs?

r-dub
We've also tried this with similar results. EXPLAINing this query shows that the index, `ix_processstage_due_date` still isn't being utilized.
RyanTheDev
What database engine are you using?
r-dub
Have you tried a composite index of due_date and event_id?
r-dub
We're using InnoDB. We're currently adding the composite index to see if performance improves.
RyanTheDev
Adding this index didn't notably enhance performance (that we can tell). It's almost as if the indexes are just simply being ignored.
RyanTheDev
I'd give what longneck suggested a try, here's the man pages on forcing index usage http://dev.mysql.com/doc/refman/5.1/en/index-hints.html
r-dub
I added a couple more suggestions to my answer. Also can you post the index's now that you created the composite index?
r-dub
+1  A: 

try using USE INDEX. if mysql decides that it needs to read a certain fraction of the table, it will resort to a table scan. with USE INDEX, you're telling mysql to consider table scans very expensive.

longneck
A: 

Some ideas:

  • Do an explicit type conversion from your string '2009-10-28 16:07:59' to a date. What may happen now is that your processstage.due_date are converted to strings before the comparison. Doing a type conversion (not sure about the systax in MySQL, but should be something like CAST (<your date string> as DATE) will help the optimizer figure out to use the index.
  • How many records satisfy the date condition? If most of the table fits the condition or the index is not very selective, there may not be any advantage to using the index.
IronGoofy
Thanks for the input.1. Casting the date explicitly using MySQL's DATE() doesn't seem to persuade the optimizer. A good suggestion, though.2. Around 4,000 (the entire `processstage` table has around 600K rows) Here's a simple query that just joins the two tables and filters on the due_date column:`SELECT COUNT(processstage.id) FROM processstage INNER JOIN jobevent ON jobevent.id = processstage.event_id WHERE processstage.due_date <= '2009-10-28 16:07:59';`
RyanTheDev
Does the select you mentioned use the index on due_date?
IronGoofy