views:

247

answers:

7

A forum seems to be an interesting project in terms of schema design.

I was wondering, to get a recordset consisting of an opening thread and replies, what is the best way? I could store all threads (the first, opening post) in a table and have a table with replies. I could then use the appropriate join to get all the replies in the reply table which match with the thread (e.g. unique key match between pk/fk). I could then sort this table based on data of replies to get the start to last post. Would this be a view or temporary table? And why?

Alternatively, I could store everything in one table and use recursion to get the data back. This is a little more complex?

What is the best approach? Or perhaps there is a better way then both of these?

Thanks

A: 

I once had this as an interview question.

I think I said it depends on performance, maintanability and other critiera which is the correct approach, however I went with a multi-table approach.
(threads, messages, etc)

Sadly I never got the job. So you may want to go with the more complex approach.
However I still think the right answer is, it depends.

Bravax
+1  A: 

I like the single table approach with a nullable InReplyTo column.

I do not like the two table approach, because posts and replies share so much structure it's a shame to have two almost identical tables. Which leads to a third possibility...

Have a "Message" table with the common elements, and a "References" table to show the relations between individual messages.

dwc
+1  A: 

Storing hierarchical data in a database has been addressed a few times on SO before. Here is one of the more comprehensive responses.

Sean Bright
A: 

A simple approach would be to store all forum posts in a single table like this:

| id | parent_post | post_content | user | timestamp |

Obviously simplified a little since for most forums you also want to store data like the IP address it was posted from, etc. Then to display a thread, all you need to do is

SELECT post_content,user [...] FROM posts WHERE 
    parent_post = $id ORDER BY timestamp;

This is pseudocode/simplified but you get the idea.

EDIT: I'm assuming you're talking about a standard forum where it's generally one main post and other posts are all "children" of the main post. If you're wanting to design this to allow nested replies like a threaded usenet/email conversation then obviously I wouldn't take this approach!

Jay
A: 

Joe Celko has a book about how to represent tree structures in SQL. It is the most definitive guide I've seen on the topic. Googling for Joe Celko and trees shoudl locate a few of his online articles on the subject.

His "SQL for Smarties" book perhaps?
Kristen
Funnily enough, I have this book. I was trying to step away from the hierarchical approach, however.
dotnetdev
A: 

I agree with Jay

| id | parent_post | post_content | user | timestamp |

but to get the whole Thread you also need to have some sort of Path or Sequence column that presents the Posts in a logical order.

We tends to use both - the Path can be split to provide parentage of the hierarchy. An additional Sequence column, which is valid for a given thread, can be used as a simple ORDER BY to retrieve the Posts in Thread-order - but it is likely to need updating often as replies are inserted between other posts in the thread.

For that reason I would put it in another table - with a 1:1 relationship to the Posts table.

Kristen
A: 

The problem with storing 'parent post' is that you have to scan through the data set repeatedly when rendering it.

Given that forums tend to be rendered more than they're modified, you should optimise for the read, not the write.

A fantastic approach to this is known as Nested Sets. This approach has many performance improvements for selects, including being able to pull back and render sub-trees very easily.

I have used BetterNestedSet for Ruby on Rails before. The idea is simple enough that you could implement it yourself.

Drew Noakes