views:

2992

answers:

2

Hey all,

I built a custom forum for my site using MySQL. The listing page is essentially a table with the following columns: Topic, Last Updated, and # Replies.

The DB table has the following columns:

id
name
body
date
topic_id
email

A topic has the topic_id of "0", and replies have the topic_id of their parent topic.

SELECT SQL_CALC_FOUND_ROWS
    t.id, t.name, MAX(COALESCE(r.date, t.date)) AS date, COUNT(r.id) AS replies
FROM
    wp_pod_tbl_forum t
LEFT OUTER JOIN
    wp_pod_tbl_forum r ON (r.topic_id = t.id)
WHERE
    t.topic_id = 0
GROUP BY
    t.id
ORDER BY
    date DESC LIMIT 0,20;

There are about 2,100 total items in this table, and queries usually take a whopping 6 seconds. I added an INDEX to the "topic_id" column, but that didn't help much. Are there any ways of speeding up this query w/out doing significant restructuring?

EDIT: not quite working yet. I can't seem to get the examples below to work properly.

+2  A: 
SELECT  id, name, last_reply, replies
FROM    (
        SELECT  topic_id, MAX(date) AS last_reply, COUNT(*) AS replies
        FROM    wp_pod_tbl_forum
        GROUP BY
                topic_id
        ) r
JOIN    wp_pod_tbl_forum t
ON      t.topic_id = 0
        AND t.id = r.topic_id
UNION ALL
SELECT  id, name, date, 0
FROM    wp_pod_tbl_forum t
WHERE   NOT EXISTS
        (
        SELECT  NULL
        FROM    wp_pod_tbl_forum r
        WHERE   r.topic_id = t.id
        )
        AND t.topic_id = 0
ORDER BY
       date DESC
LIMIT 0, 20

If your table is MyISAM or id is not a PRIMARY KEY, you need to create a composite ondex on (topic_id, id).

If your table is InnoDB and id is a PRIMARY KEY, an index just on (topic_id) will do (id will be implicitly added to the index).

Update

This query will most probably be even more efficient, provided that you have indexes on (topic_id, id) and (date, id):

See this article in my blog for performance details:

This query completes in 30 ms on a 100,000 rows sample data:

SELECT  id, name, last_reply,
        (
        SELECT  COUNT(*)
        FROM    wp_pod_tbl_forum fc
        WHERE   fc.topic_id = fl.topic_id
        ) AS replies
FROM    (
        SELECT  topic_id, date AS last_reply
        FROM    wp_pod_tbl_forum fo
        WHERE   id = (
                SELECT  id
                FROM    wp_pod_tbl_forum fp
                WHERE   fp.topic_id = fo.topic_id
                ORDER BY
                        fp.date DESC, fp.id DESC
                LIMIT 1
                )
                AND fo.topic_id <> 0
        ORDER BY
                fo.date DESC, fo.id DESC
        LIMIT 20
        ) fl
JOIN    wp_pod_tbl_forum ft
ON      ft.id = fl.topic_id
UNION ALL
SELECT  id, name, date, 0
FROM    wp_pod_tbl_forum t
WHERE   NOT EXISTS
        (
        SELECT  NULL
        FROM    wp_pod_tbl_forum r
        WHERE   r.topic_id = t.id
        )
        AND t.topic_id = 0
ORDER BY
       last_reply DESC, id DESC
LIMIT  20

Both indexes are required for this query to be efficient.

If your table is InnoDB and id is a PRIMARY KEY, then you can omit id from the indexes above.

Quassnoi
Column 'date' in field list is ambiguous..?
Matt
@Matt: see update
Quassnoi
@Quassnoi - can you explain what's going on? Does the "UNION ALL" replace "last_reply" with "date" if the topic has no replies?
Matt
LEFT JOIN is inefficient in your case, since you cannot use INDEX FOR GROUP BY to group by a column in table which does not lead in a JOIN. I rewrote your query as a combination of two other queries, first one using an INDEX FOR GROUP BY to build the aggregates, the second one being joinless and groupless.
Quassnoi
Thanks, I'll give it a try.
Matt
+1  A: 

You may want to break it up into a set of subqueries (as inner queries). I'd need the schema to really play, but if you

SELECT t.id, t.name, MAX(COALESCE(r.date, t.date)) AS date, COUNT(r.id) AS replies  
FROM (
   SELECT (id, name, date)
   FROM wp_pod_tbl_forum
   WHERE topic_id = 0  
) as t 
LEFT OUTER JOIN
   wp_pod_tbl_forum r
WHERE
   r.topic_id = t.id
GROUP BY
    t.id
ORDER BY
    date DESC LIMIT 0,20;

that may help speed it up a little, it may not even be the best answer (errors may exist).

There are tons of ways to do it, but the most important thing to do when SQL tuning is to reduce each set as much as possible before performing an operation.

Dave Morgan