There are many ways. Here's one approach that I like (and use on a regular basis).
The database
Consider the following database structure:
CREATE TABLE comments (
id int(11) unsigned NOT NULL auto_increment,
parent_id int(11) unsigned default NULL,
parent_path varchar(255) NOT NULL,
comment_text varchar(255) NOT NULL,
date_posted datetime NOT NULL,
PRIMARY KEY (id)
);
your data will look like this:
+-----+-------------------------------------+--------------------------+---------------+
| id | parent_id | parent_path | comment_text | date_posted |
+-----+-------------------------------------+--------------------------+---------------+
| 1 | null | / | I'm first | 1288464193 |
| 2 | 1 | /1/ | 1st Reply to I'm First | 1288464463 |
| 3 | null | / | Well I'm next | 1288464331 |
| 4 | null | / | Oh yeah, well I'm 3rd | 1288464361 |
| 5 | 3 | /3/ | reply to I'm next | 1288464566 |
| 6 | 2 | /1/2/ | this is a 2nd level reply| 1288464193 |
... and so on...
It's fairly easy to select everything in a useable way:
select id, parent_path, parent_id, comment_text, date_posted
from comments
order by parent_path, date_posted;
ordering by parent_path, date_posted
will usually produce results in the order you'll need them when you generate your page; but you'll want to be sure that you have an index on the comments table that'll properly support this -- otherwise the query works, but it's really, really inefficient:
create index comments_hier_idx on comments (parent_path, date_posted);
For any given single comment, it's easy to get that comment's entire tree of child-comments. Just add a where clause:
select id, parent_path, parent_id, comment_text, date_posted
from comments
where parent_path like '/1/%'
order by parent_path, date_posted;
the added where clause will make use of the same index we already defined, so we're good to go.
Notice that we haven't used the parent_id
yet. In fact, it's not strictly necessary. But I include it because it allows us to define a traditional foreign key to enforce referential integrity and to implement cascading deletes and updates if we want to. Foreign key constraints and cascading rules are only available in INNODB tables:
ALTER TABLE comments ENGINE=InnoDB;
ALTER TABLE comments
ADD FOREIGN KEY ( parent_id ) REFERENCES comments
ON DELETE CASCADE
ON UPDATE CASCADE;
Managing The Hierarchy
In order to use this approach, of course, you'll have to make sure you set the parent_path
properly when you insert each comment. And if you move comments around (which would admittedly be a strange usecase), you'll have to make sure you manually update each parent_path of each comment that is subordinate to the moved comment. ... but those are both fairly easy things to keep up with.
If you really want to get fancy (and if your db supports it), you can write triggers to manage the parent_path transparently -- I'll leave this an exercise for the reader, but the basic idea is that insert and update triggers would fire before a new insert is committed. they would walk up the tree (using the parent_id
foreign key relationship), and rebuild the value of the parent_path
accordingly.
It's even possible to break the parent_path
out into a separate table that is managed entirely by triggers on the comments table, with a few views or stored procedures to implement the various queries you need. Thus completely isolating your middle-tier code from the need to know or care about the mechanics of storing the hierarchy info.
Of course, none of the fancy stuff is required by any means -- it's usually quite sufficient to just drop the parent_path into the table, and write some code in your middle-tier to ensure that it gets managed properly along with all the other fields you already have to manage.
Imposing limits
MySQL (and some other databases) allows you to select "pages" of data using the the LIMIT
clause:
SELECT * FROM mytable LIMIT 25 OFFSET 0;
Unfortunately, when dealing with hierarchical data like this, the LIMIT clause alone won't yield the desired results.
-- the following will NOT work as intended
select id, parent_path, parent_id, comment_text, date_posted
from comments
order by parent_path, date_posted
LIMIT 25 OFFSET 0;
Instead, we need to so a separate select at the level where we want to impose the limit, then we join that back together with our "sub-tree" query to give the final desired results.
Something like this:
select
a.*
from
comments a join
(select id, parent_path
from comments
where parent_id is null
order by parent_path, post_date DESC
limit 25 offset 0) roots
on a.parent_path like concat(roots.parent_path,roots.id,'/%') or a.id=roots.id)
order by a.parent_path , post_date DESC;
Notice the statement limit 25 offset 0
, buried in the middle of the inner select. This statement will retrieve the most recent 25 "root-level" comments.
[edit: you may find that you have to play with stuff a bit to get the ability to order and/or limit things exactly the way you like. this may include adding information within the hierarchy that's encoded in parent_path
. for example: instead of /{id}/{id2}/{id3}/
, you might decide to include the post_date as part of the parent_path: /{id}:{post_date}/{id2}:{post_date2}/{id3}:{post_date3}/
. This would make it very easy to get the order and hierarchy you want, at the expense of having to populate the field up-front, and manage it as the data changes]
hope this helps.
good luck!