I am trying to better understand why this query optimization is so significant (over 100 times faster) so I can reuse similar logic for other queries.
Using MySQL 4.1 - RESET QUERY CACHE and FLUSH TABLES was done before all queries and result time can be reproduced consistently. Only thing that is obvious to me on the EXPLAIN is that only 5 rows have to be found during the JOIN ? But is that the whole answer to the speed? Both queries are using a partial index (forum_stickies) to determine deleted topics status (topic_status=0)
Screenshots for deeper analysis with EXPLAIN
slow query: 0.7+ seconds (cache cleared)
SELECT SQL_NO_CACHE forum_id, topic_id FROM bb_topics
WHERE topic_last_post_id IN
(SELECT SQL_NO_CACHE MAX (topic_last_post_id) AS topic_last_post_id
FROM bb_topics WHERE topic_status=0 GROUP BY forum_id)
fast query: 0.004 seconds or less (cache cleared)
SELECT SQL_NO_CACHE forum_id, topic_id FROM bb_topics AS s1
JOIN
(SELECT SQL_NO_CACHE MAX(topic_last_post_id) AS topic_last_post_id
FROM bb_topics WHERE topic_status=0 GROUP BY forum_id) AS s2
ON s1.topic_last_post_id=s2.topic_last_post_id
Note there is no index on the most important column (topic_last_post_id
) but that cannot be helped (results are stored for repeated use anyway).
Is the answer simply because the first query has to scan topic_last_post_id
TWICE, the second time to match up the results to the subquery? If so, why is it exponentially slower?
(less important I am curious why the first query still takes so long if I actually do put an index on topic_last_post_id
)
update: I found this thread on stackoverflow after much searching later on which goes into this topic http://stackoverflow.com/questions/141278/subqueries-vs-joins