views:

3654

answers:

6

I've been profiling some queries in an application I'm working on, and I came across a query that was retrieving more rows than necessary, the result set being trimmed down in the application code.

Changing a LEFT JOIN to an INNER JOIN trimmed the result set to just what was needed, and presumably would also be more performant (since less rows are selected). In reality, the LEFT JOIN'ed query was outperforming the INNER JOIN'ed, taking half the time to complete.

LEFT JOIN: (127 total rows, Query took 0.0011 sec)

INNER JOIN: (10 total rows, Query took 0.0024 sec)

(I ran the queries multiple times and those are averages).

Running EXPLAIN on both reveals nothing that explains the performance differences:

For the INNER JOIN:

id  select_type  table  type  possible_keys  key  key_len  ref     rows  Extra
1   SIMPLE  contacts      index      NULL      name     302  NULL       235  Using where
1   SIMPLE  lists          eq_ref      PRIMARY  PRIMARY  4  contacts.list_id   1   
1   SIMPLE  lists_to_users  eq_ref   PRIMARY  PRIMARY  8  lists.id,const  1   
1   SIMPLE  tags          eq_ref     PRIMARY  PRIMARY  4  lists_to_users.tag_id  1   
1   SIMPLE  users          eq_ref      email_2  email_2  302  contacts.email 1  Using where

For the LEFT JOIN:

id  select_type  table  type  possible_keys  key  key_len  ref  rows  Extra
1   SIMPLE       contacts index      NULL      name      302  NULL  235  Using where
1   SIMPLE        lists  eq_ref   PRIMARY  PRIMARY  4  contacts.list_id  1   
1   SIMPLE    lists_to_users eq_ref     PRIMARY  PRIMARY  8  lists.id,const  1   
1   SIMPLE         tags  eq_ref      PRIMARY  PRIMARY  4  lists_to_users.tag_id  1   
1   SIMPLE        users  eq_ref      email_2  email_2  302  contacts.email  1

And the query itself:

SELECT `contacts`.*, `lists`.`name` AS `group`, `lists`.`id` AS `group_id`, `lists`.`shared_yn`, `tags`.`name` AS `context`, `tags`.`id` AS `context_id`, `tags`.`color` AS `context_color`, `users`.`id` AS `user_id`, `users`.`avatar` 
FROM `contacts`  
LEFT JOIN `lists` ON lists.id=contacts.list_id  
LEFT JOIN `lists_to_users` ON lists_to_users.list_id=lists.id AND lists_to_users.user_id='1' AND lists_to_users.creator='1'  
LEFT JOIN `tags` ON tags.id=lists_to_users.tag_id 
INNER JOIN `users` ON users.email=contacts.email 
WHERE (contacts.user_id='1') 
ORDER BY `contacts`.`name` ASC

(The clause that I'm talking about is the last INNER JOIN on the 'users' table)

The query runs on a MySQL 5.1 database, if it makes a difference.

Does anyone has a clue on why the LEFT JOIN'ed query outperforms the INNER JOIN'ed on in this case?

UPDATE: Due to Tomalak's suggestion that the small tables I'm using were making the INNER JOIN more complex, I'd created a test database with some mock data. The 'users' table is 5000 rows, and the contacts table is ~500,000 rows. The results are the same (also the timings haven't changed which is surprising when you consider that the tables are much bigger now).

I also ran ANALYZE and OPTIMIZE on the contacts table. Didn't make any discernible difference.

+3  A: 

It's probably due to the INNER JOIN having to check each row in both tables to see if the column values (email in your case) match. The LEFT JOIN will return all from one table regardless. If it's indexed then it will know what to do faster too.

HAdes
I tried using an index on the email column, and a combined index on the name + email columns, but the query execution plan remains the same
Eran Galperin
That will help both the INNER and LEFT joins I guess, so I wouldn't have thought it would make one faster than the other by doing so.
HAdes
The inner join scans one table and find matching rows in the other, ideally using and index for that. It does not have to check each row in both tables as you suggest.
Tomalak
Isn't that what the scan is doing? Why else would there be a performance difference.
HAdes
+3  A: 

Table cardinality has an influence on the query optimizer. I guess small tables as you have make the inner join the more complex operation. As soon as you have more records than the DB server is willing to keep in memory, the inner join will probably begin to outperform the left join.

Tomalak
That's interesting. I'll have to check on a larger set and see if it performs how you described it.
Eran Galperin
I re-ran with much larger tables and the results are the same.
Eran Galperin
+1  A: 

imo you are falling into the pitfall known as premature optimization. Query optimizers are insanely fickle things. My suggestion, is to move on until you can identify for sure that the a particular join is problematic.

Greg Dean
This is not about optimization, this is about understanding why the query behaves in a certain way.
Eran Galperin
A: 

LEFT JOIN is returning more rows than INNER JOIN because these 2 are different.
If LEFT JOIN does not find related entry in the table it is looking for, it will return NULLs for the table.
But if INNER JOIN does not find related entry, it will not return the whole row at all.

But to your question, do you have query_cache enabled? Try running the query with

SELECT SQL_NO_CACHE `contacts`.*, ...

Other than that, I'd populate the tables with more data, ran

ANALYZE TABLE t1, t2;
OPTIMIZE TABLE t1, t2;

And see what happens.

michal kralik
Of course the left join returns more rows, that's not the point of the question. Why it runs more quickly WHILE returning more rows is what boggles me
Eran Galperin
+5  A: 

If you think that the implementation of LEFT JOIN is INNER JOIN + more work, then this result is confusing. What if the implementation of INNER JOIN is (LEFT JOIN + filtering)? Ah, it is clear now.

In the query plans, the only difference is this: users... extra: using where . This means filtering. There's an extra filtering step in the query with the inner join.


This is a different kind of filtering than is typically used in a where clause. It is simple to create an index on A to support this filtering action.

SELECT *
FROM A
WHERE A.ID = 3

Consider this query:

SELECT *
FROM A
  LEFT JOIN B
  ON A.ID = B.ID
WHERE B.ID is not null

This query is equivalent to inner join. There is no index on B that will help that filtering action. The reason is that the where clause is stating a condition on the result of the join, instead of a condition on B.

David B
I am aware of the difference between a left join and an inner join. You could say the same about the WHERE clause, however queries filtered with a where clause usually take much less time to compute.
Eran Galperin
I read what you added, and though I think you might be on to something with the extra filtering step, I think you are off target as to why. There is an index on the extra filtering column 'email' (which is used), so it should be fast enough to improve performance.
Eran Galperin
Yes, the index on email does help the left join. No, the index on email does not allow fast filtering of the post-join results.
David B
A: 

Try this:

SELECT `contacts`.*, `lists`.`name` AS `group`, `lists`.`id` AS `group_id`, `lists`.`shared_yn`, `tags`.`name` AS `context`, `tags`.`id` AS `context_id`, `tags`.`color` AS `context_color`, `users`.`id` AS `user_id`, `users`.`avatar` 
FROM `contacts`  
INNER JOIN `users` ON contacts.user_id='1' AND users.email=contacts.email
LEFT JOIN `lists` ON lists.id=contacts.list_id  
LEFT JOIN `lists_to_users` ON lists_to_users.user_id='1' AND lists_to_users.creator='1' AND lists_to_users.list_id=lists.id
LEFT JOIN `tags` ON tags.id=lists_to_users.tag_id 
ORDER BY `contacts`.`name` ASC

That should give you an extra performance because:

  • You put all the inner joins before any "left" or "right" join appears. This filters out some records before applying the subsequent outer joins
  • The short-circuit of the "AND" operators (order of the "AND" matters). If the comparition between the columns and the literals is false, it won't execute the required table scan for the comparition between the tables PKs and FKs

If you don't find any performance improvement, then replace all the columnset for a "COUNT(*)" and do your left/inner tests. This way, regardless of the query, you will retrieve only 1 single row with 1 single column (the count), so you can discard that the number of returned bytes is the cause of the slowness of your query:

SELECT COUNT(*)
FROM `contacts`  
INNER JOIN `users` ON contacts.user_id='1' AND users.email=contacts.email
LEFT JOIN `lists` ON lists.id=contacts.list_id  
LEFT JOIN `lists_to_users` ON lists_to_users.user_id='1' AND lists_to_users.creator='1' AND lists_to_users.list_id=lists.id
LEFT JOIN `tags` ON tags.id=lists_to_users.tag_id

Good luck