views:

521

answers:

4

I have a query which is starting to cause some concern in my application. I'm trying to understand this EXPLAIN statement better to understand where indexes are potentially missing:

      +----+-------------+-------+--------+---------------+------------+---------+-------------------------------+------+---------------------------------+
      | id | select_type | table | type   | possible_keys | key        | key_len | ref                           | rows | Extra                           |
      +----+-------------+-------+--------+---------------+------------+---------+-------------------------------+------+---------------------------------+
      |  1 | SIMPLE      | s     | ref    | client_id     | client_id  | 4       | const                         |  102 | Using temporary; Using filesort | 
      |  1 | SIMPLE      | u     | eq_ref | PRIMARY       | PRIMARY    | 4       | www_foo_com.s.user_id    |    1 |                                      | 
      |  1 | SIMPLE      | a     | ref    | session_id    | session_id | 4       | www_foo_com.s.session_id |    1 | Using index                          | 
      |  1 | SIMPLE      | h     | ref    | email_id      | email_id   | 4       | www_foo_com.a.email_id   |   10 | Using index                          | 
      |  1 | SIMPLE      | ph    | ref    | session_id    | session_id | 4       | www_foo_com.s.session_id |    1 | Using index                          | 
      |  1 | SIMPLE      | em    | ref    | session_id    | session_id | 4       | www_foo_com.s.session_id |    1 |                                      | 
      |  1 | SIMPLE      | pho   | ref    | session_id    | session_id | 4       | www_foo_com.s.session_id |    1 |                                      | 
      |  1 | SIMPLE      | c     | ALL    | userfield     | NULL       | NULL    | NULL                          | 1108 |                                 | 
      +----+-------------+-------+--------+---------------+------------+---------+-------------------------------+------+---------------------------------+
      8 rows in set (0.00 sec) 

I'm trying to understand where my indexes are missing by reading this EXPLAIN statement. Is it fair to say that one can understand how to optimize this query without seeing the query at all and just look at the results of the EXPLAIN?

It appears that the ALL scan against the 'c' table is the achilles heel. What's the best way to index this based on constant values as recommended on MySQL's documentation? |

Note, I also added an index to userfield in the cdr table and that hasn't done much good either.

Thanks.

--- edit ---

Here's the query, sorry -- don't know why I neglected to include it the first pass through.

SELECT s.`session_id` id,
                  DATE_FORMAT(s.`created`,'%m/%d/%Y') date,
                  u.`name`,
                  COUNT(DISTINCT c.id) calls,
                  COUNT(DISTINCT h.id) emails,
                  SEC_TO_TIME(MAX(DISTINCT c.duration)) duration,
                  (COUNT(DISTINCT em.email_id) + COUNT(DISTINCT pho.phone_id) > 0) status
           FROM `fa_sessions` s
           LEFT JOIN `fa_users` u ON s.`user_id`=u.`user_id`
           LEFT JOIN `fa_email_aliases` a ON a.session_id = s.session_id
           LEFT JOIN `fa_email_headers` h ON h.email_id = a.email_id
           LEFT JOIN `fa_phones` ph ON ph.session_id = s.session_id
           LEFT JOIN `fa_email_aliases` em ON em.session_id = s.session_id AND em.status = 1
           LEFT JOIN `fa_phones` pho ON pho.session_id = s.session_id AND pho.status = 1
           LEFT JOIN `cdr` c ON c.userfield = ph.phone_id
           WHERE s.`partner_id`=1
           GROUP BY s.`session_id`
+2  A: 

Well looking at the query would be useful, but there's at least one thing that's obviously worth looking into - the final line shows the ALL type for that part of the query, which is generally not great to see. If the suggested possible key (userfield) makes sense as an added index to table c, it might be worth adding it and seeing if that reduces the rows returned for that table in the search.

Keith Lawrence
+3  A: 

I assume you've looked here to get more info about what it is telling you. Obviously the ALL means its going through all of them. The using temporary and using filesort are talked about on that page. You might want to look at that.

From the page:

Using filesort

MySQL must do an extra pass to find out how to retrieve the rows in sorted order. The sort is done by going through all rows according to the join type and storing the sort key and pointer to the row for all rows that match the WHERE clause. The keys then are sorted and the rows are retrieved in sorted order. See Section 7.2.12, “ORDER BY Optimization”.

Using temporary

To resolve the query, MySQL needs to create a temporary table to hold the result. This typically happens if the query contains GROUP BY and ORDER BY clauses that list columns differently.

I agree that seeing the query might help to figure things out better.

itsmatt
+3  A: 

My advice?

Break the query into 2 and use a temporary table in the middle.

Reasonning

The problem appears to be that table c is being table scanned, and that this is the last table in the query. This is probably bad: if you have a table scan, you want to do it at the start of the query, so it's only done once.

I'm not a MySQL guru, but I have spent a whole lot of time optimising queries on other DBs. It looks to me like the optimiser hasn't worked out that it should start with c and work backwards.

The other thing that strikes me is that there are probably too many tables in the join. Most optimisers struggle with more than 4 tables (because the number of possible table orders is growing exponentially, so checking them all becomes impractical).
Having too many tables in a join is the root of 90% of performance problems I have seen.

Give it a go, and let us know how you get on. If it doesn't help, please post the SQL, table definitions and indeces, and I'll take another look.

General Tips

Feel free to look at this answer I gave on general performance tips.

A great resource

MySQL Documentation for EXPLAIN

AJ
+1  A: 

Query Plan

The query plan we might hope the optimiser would choose would be something like:

  • start with sessions where partner_id=1 , possibly using an index on partner_id,
  • join sessions to users, using an index on user_id
  • join sessions to phones, where status=1, using an index on session_id and possibly status
  • join sessions to phones again using an index on session_id and phone_id **
  • join phones to cdr using an index on userfield
  • join sessions to email_aliases, where status=1 using an index on session_id and possibly status
  • join sessions to email_aliases again using an index on session_id and email_id **
  • join email_aliases to email_headers using an index on email_id

** by putting 2 fields in these indeces, we enable the optimiser to join to the table using session_id, and immediately find out the associated phone_id or email_id without having to read the underlying table. This technique saves us a read, and can save a lot of time.

Indeces I would create:

The above query plan suggests these indeces:

fa_sessions ( partner_id, session_id )  
fa_users ( user_id )  
fa_email_aliases ( session_id, email_id )  
fa_email_headers ( email_id )  
fa_email_aliases ( session_id, status )  
fa_phones ( session_id, status, phone_id ) 
cdr ( userfield )

Notes

  • You will almost certainly get acceptable performance without creating all of these.
  • If any of the tables are small ( less than 100 rows ) then it's probably not worth creating an index.
  • fa_email_aliases might work with ( session_id, status, email_id ), depending on how the optimiser works.
AJ