tags:

views:

176

answers:

4

I'm working on building a forum with kohana. I know there is already good, free, forum software out there, but it's for a family site, so I thought I'd use it as a learning experience. I'm also not using the ORM that is built into Kohana, as I would like to learn more about SQL in the process of building the forum.

For my forum I have 4 main tables:

  • USERS
  • TOPICS
  • POSTS
  • COMMENTS

TOPICS table: id (auto incremented), topic row.

USERS table: username, email, first and last name and a few other non related rows

POSTS table: id (auto incremented), post-title, post-body, topic-id, user-id, post-date, updated-date, updated-by(which will contain the user-id of the person who made the most recent comment)

COMMENTS table: id (auto incremented), post-id, user-id and comment


On the main forum page I would like to have:

  • a list of all of the topics
  • the number of posts for each topic
  • the last updated post, and who updated it
  • the most recently updated topic to be on top, most likely an "ORDER BY updated-date"

Here is the query I have so far:

SELECT topics.id AS topic-id, 
       topics.topic, 
       post-user.id AS user-id, 
       CONCAT_WS(' ', post-user.first-name, post-user.last-name) AS name, 
       recent-post.id AS post-id, 
       post-num.post-total, 
       recent-post.title AS post-title, 
       recent-post.update_date AS updated-date, 
       recent-post.updated-by AS updated-by
  FROM topics
  JOIN (SELECT posts.topic-id,
               COUNT(*) AS post-total                 
          FROM POSTS
         WHERE posts.topic-id = topic-id 
      GROUP BY posts.topic-id) AS post-num ON topics.id = post-num.topic-id
  JOIN (SELECT posts.* 
          FROM posts 
      ORDER BY posts.update-date DESC) AS recent-post ON topics.id = recent-post.topic-id 
  JOIN  (SELECT users.*, 
                posts.user-id 
           FROM users, posts 
          WHERE posts.user-id = users.id) as post-user ON recent-post.user_id = post-user.id 
GROUP BY topics.id

This query almost works as it will get all of information for topics that have posts. But it doesn't return the topics that don't have any posts.

I'm sure that the query is inefficient and wrong since it makes two sub-selects to the posts table, but it was the only way I could get to the point I'm at.

A: 

to ensure you get results for topics without posts, you'll need to use LEFT JOIN instead of JOIN for the first join between topics and the next table. LEFT JOIN means "always return a result set row for every row in the left table, even if there's no match with the right table."

Gotta go now, but I'll try to look at the efficiency issues later.

Justin Grant
worked like a charm! thanks.This site is going to be really small scale, as I said its only for my family. There is only about 30 of us, I doubt I would run into any slow downs due to the query. But any other cleaning up that you could suggest would be awesome thanks
iangraham
A: 

I'd use a left join inside a subquery to pull back the correct topic, and then you can do a little legwork outside of that to get some of the user info.

select
    s.topic_id,
    s.topic,
    u.user_id as last_updated_by_id,
    u.user_name as last_updated_by,
    s.last_post,
    s.post_count
from
    (
        select
            t.id as topic_id,
            t.topic,
            t.user_id as orig_poster,
            max(coalesce(p.post_date, t.post_date)) as last_post,
            count(*) as post_count --would be p.post_id if you don't want to count the topic
        from
            topics t
            left join posts p on
                t.id = p.topic_id
        group by
            t.topic_id,
            t.topic,
            t.user_id
    ) s
    left join posts p on
        s.topic_id = p.topic_id
        and s.last_post = p.post_date
        and s.post_count > 1 --0 if you're using p.post_id up top
    inner join users u on
        u.id = coalesce(p.user_id, s.orig_poster)
order by 
    s.last_post desc

This query does introduce coalesce and left join, and they are very good concepts to look into. For two arguments (like used here), you can also use ifnull in MySQL, since it is functionally equivalent.

Keep in mind that that's exclusive to MySQL (if you need to port this code). Other databases have other functions for that (isnull in SQL Server, nvl in Oracle, etc., etc.). I used coalesce so that I could keep this query all ANSI-fied.

Eric
A: 

This is a very complicated query. You should note that JOIN statements will limit your topics to those that have posts. If a topic does not have a post, a JOIN statement will filter it out.

Try the following query.

SELECT * 
FROM
(
  SELECT T.Topic, 
         COUNT(AllTopicPosts.ID) NumberOfPosts, 
         MAX(IFNULL(MostRecentPost.Post-Title, '') MostRecentPostTitle,
         MAX(IFNULL(MostRecentPostUser.UserName, '') MostRecentPostUser
         MAX(IFNULL(MostRecentPost.Updated_Date, '') MostRecentPostDate
  FROM TOPICS
  LEFT JOIN POSTS AllTopicPosts ON AllTopicPosts.Topic_Id = TOPICS.ID
  LEFT JOIN 
     (
       SELECT * 
       FROM Posts P
       WHERE P.Topic_id = TOPICS.id
       ORDER BY P.Updated_Date DESC
       LIMIT 1
     ) MostRecentPost ON MostRecentPost.Topic_Id = TOPICS.ID
  LEFT JOIN USERS MostRecentPostUser ON MostRecentPostUser.ID = MostRecentPost.User_Id
  GROUP BY T.Topic
)
ORDER BY MostRecentPostDate DESC
David Andres
Why are you doing a `max` on the title and user? If you had a topic "ZOMG! Ponies!" and a "ABC123", the "ZOMG! Ponies!" would return regardless of whether or not "ABC123" was the last one updated...
Eric
You might be right. I am grouping by Topic only, and I know in other systems you'll fail the SELECT if you don't include the columns in either the GROUP BY clause or aggregated within the SELECT statement. I know MySQL works differently in this regard, but for the life of me I can't write something I know won't work in other systems.
David Andres
+1  A: 
  • Dash is not a valid character in SQL identifiers, but you can use "_" instead.
  • You don't necessarily have to get everything from a single SQL query. In fact, trying to do so makes it harder to code, and also sometimes makes it harder for the SQL optimizer to execute.
  • It makes no sense to use ORDER BY in a subquery.
  • Name your primary key columns topic_id, user_id, and so on (instead of "id" in every table), and you won't have to alias them in the select-list.

Here's how I would solve this:

First get the most recent post per topic, with associated user information:

SELECT t.topic_id, t.topic,
  u.user_id, CONCAT_WS(' ', u.first_name, u.last_name) AS full_name,
  p.post_id, p.title, p.update_date, p.updated_by
FROM topics t
INNER JOIN 
  (posts p INNER JOIN users u ON (p.updated_by = u.user_id))
  ON (t.topic_id = p.topic_id)
LEFT OUTER JOIN posts p2
  ON (p.topic_id = p2.topic_id AND p.update_date < p2.update_date)
WHERE p2.post_id IS NULL;

Then get the counts of posts per topic in a separate, simpler query.

SELECT t.topic_id, COUNT(*) AS post_total
FROM topics t LEFT OUTER JOIN posts p USING (topic_id)
GROUP BY t.topic_id;

Merge the two data sets in your application.

Bill Karwin
MySQL lets you put an `inner join` inside of an `inner join`? Who wrote that database, Xzibit? Although, it's a rather neat trick.
Eric
Yeah, parenthesized joins is actually standard SQL, not only a MySQL-ism.
Bill Karwin
The technique can be especially useful when you want to outer join to a pair of tables that are related with an inner join, i.e. `X LEFT OUTER JOIN (Y INNER JOIN Z)`
Bill Karwin
I'll be, so it is (I was skeptical, so tried it on my local SQL Server instance). This is equivalent to `select * from a inner join (select * from b inner join c on b.id = c.id) on a.id = b.id`, correct? Obviously, that's the same as it would be without the parens, but I'm assuming for outer join behavior, it can give you some fairly gnarly results if you're not careful.
Eric
Yes, it produces a similar result, but without using a subquery. Parentheses are parentheses, and you can use them to control order of evaluation, just like in an arithmetic expression. And yes, you can get gnarly results with outer join if you're not careful, but that's always true! :-)
Bill Karwin