The first syntax is generally more efficient.
MySQL
buffers the derived queries so using the derived query robs the user_profile
of possibility to be a driven table in the join.
Even if the user_profile
is leading, the subquery results should be buffered first which implies a memory and performance impact.
A LIMIT
applied to the queries will make the first query much faster which is not true for the second one.
Here are the sample plans. There is an index on (val, nid)
in the table t_source
:
First query:
EXPLAIN
SELECT *
FROM t_source s1
JOIN t_source s2
ON s2.nid = s1.id
WHERE s2.val = 1
1, 'SIMPLE', 's1', 'ALL', 'PRIMARY', '', '', '', 1000000, ''
1, 'SIMPLE', 's2', 'ref', 'ix_source_val,ix_source_val_nid,ix_source_vald_nid', 'ix_source_val_nid', '8', 'const,test.s1.id', 1, 'Using where'
Second query:
EXPLAIN
SELECT *
FROM t_source s1
JOIN (
SELECT nid
FROM t_source s2
WHERE val = 1
) q
ON q.nid = s1.id
1, 'PRIMARY', '<derived2>', 'ALL', '', '', '', '', 100000, ''
1, 'PRIMARY', 's1', 'ref', 'PRIMARY', 'PRIMARY', '4', 'q.nid', 10000, 'Using where'
2, 'DERIVED', 's2', 'ref', 'ix_source_val,ix_source_val_nid,ix_source_vald_nid', 'ix_source_vald_nid', '4', '', 91324, 'Using index'
As you can see, only a part of the index is used in the second case, and q
is forced to be leading.
Update:
Derived queries (which is what this question concerns) are not to be confused with the subqueries.
While MySQL
is not able to optimize derived queries (those used in the FROM
clause), the subqueries (those used with IN
or EXISTS
) are treated much better.
See these articles in my blog for more detail: