Method 1 - improving the original query
I am pretty sure that in this case is an INNER JOIN
will suffice, there is no reason to do a RIGHT JOIN
(if an id
exists in comm
it will also exists in comments
). INNER JOIN
s can result in better performance.
Moreover, you really want to push the LIMIT 10
inside comm
(incidentally, keeping it together with the ORDER BY
):
- for one, not keeping
LIMIT 10
and ORDER BY
together will not get you the ten most recently posted-to topics (the ordering of the comm
subquery will not necessarily be preserved into the final result which you are LIMIT
ing.)
- additionally, enforcing the
LIMIT
inside the innermost, aggregate subquery will encourage cost-based optimizers to favour nested loops (10, to be exact) over hash or merge joins (the 10 nested loops being by far the fastest for any respectably-sized comments
table.)
So, your query should be rewritten as:
SELECT `comments`.* FROM `comments`
INNER JOIN (
SELECT MAX( id ) AS id, core_id, topic_id
FROM comments
GROUP BY core_id, topic_id
ORDER BY id DESC
LIMIT 10
) comm
ON comm.id = comments.id
ORDER BY comments.id
Finally, use EXPLAIN
to see what the query is doing. Do not forget to check that you have created an index on comments.id
to help with the JOIN
nested loops.
Method 2 - a different approach
Note that while the above query could still be faster than your original query, the innermost comm
subquery may still turn out to be a significant bottleneck if it results in a full table scan of comments
. This really depends on how smart the database is when it sees GROUP BY
, ORDER BY
and LIMIT
together.
If EXPLAIN
indicates that the subquery is doing a table scan, then you can try a combination of SQL and application-level logic to get the best performance assuming that I have understood your requirement correctly and that you want to identify the ten most recent comments posted in ten different topics:
# pseudo-code
core_topics_map = { }
command = "SELECT * FROM comments ORDER BY id DESC;"
command.execute
# iterate over the result set, betting that we will be able to break
# early, bringing only a handful of rows over from the database server
while command.fetch_next_row do
# have we identified our 10 most recent topics?
if core_topics_map.size >= 10 then
command.close
break
else
core_topic_key = pair(command.field('core_id'), command.field('topic_id'))
if not defined?(core_topics_map[core_topic_key]) then
core_topics_map[core_topic_key] = command.field('id')
end
end
done
# sort our 10 topics in reverse chronological order
sort_by_values core_topics_map
In most cases (that is, provided that your application's database driver does not attempt to buffer all rows into memory for you before giving you back control from execute
), the above will only fetch a handful of rows, always using an index, with no table scans involved.
Method 3 - a hybrid approach
If ten seconds ago I knew what the ten most recent comments were, can I be smart about it when I ask the question again later on? Unless comments can be deleted from the database, then the answer is yes, because I know that, when I ask the question again, all comment IDs will be greatly than or equal to the oldest comment ID I got in my last query.
I can therefore rewrite the innermost query to be much, much more selective using an additional condition, WHERE id >= :last_lowest_id
:
SELECT `comments`.* FROM `comments`
INNER JOIN (
SELECT MAX( id ) AS id, core_id, topic_id
FROM comments
WHERE id >= :last_lowest_id
GROUP BY core_id, topic_id
ORDER BY id DESC
LIMIT 10
) comm
ON comm.id = comments.id
ORDER BY comments.id
When you run the query for the very first time, use 0
for :last_lowest_id
. The query will return up to 10 rows, in descending order. Inside your application, put aside the id
of the last row, and reuse its value as :last_lowest_id
the next time you run the query, and repeat (again, put aside the id
of the last row returned by the latest query etc.) This will essentially make the query incremental, and extremely fast.
Example:
- running the query for the first time with
:last_lowest_id
set to 0
- returns 10 rows with IDs:
129, 100, 99, 88, 83, 79, 78, 75, 73, 70
- save
70
- running the query for the second time with
:last_lowest_id
set to 70
- returns 10 rows with IDs:
130, 129, 100, 99, 88, 83, 79, 78, 75, 73
- save
73
- etc.
Method 4 - yet another approach
If you expect to perform SELECT ... ORDER BY id DESC LIMIT 10
much more often than INSERT
s into the comments
table, consider putting a bit more work into the INSERT
to make the SELECT
faster. Thus, you can add an indexed updated_at
column to your topics
etc. table, and whenever you INSERT
a comment into the comments
table consider also updating the corresponding topic's updated_at
value to NOW()
. You can then easily select the 10 most recently updated topics (a simple and short index scan on updated_at
returning 10 rows), inner joining with the comments
table to get the MAX(id)
for those 10 topics (infinitely more efficient than getting MAX(id)
for all topics before picking the ten greatest, like in the original and Method 1), then inner joining again on comments
to get the rest of thecolumn values for those 10.
I expect overall performance of Method 4 to be comparable with Methods 2 and 3. Method 4 will have to be used if you need to get arbitrary topics (e.g. by paginating them, LIMIT 10 OFFSET 50
) or if topics or comments can be removed (no changes necessary to support topic removal; to support comment removal properly then the topic's updated_at
should be updated on both comment INSERT
and DELETE
with the created_at
value of the latest non-deleted comment for the topic.)
Cheers,
V.