tags:

views:

50

answers:

5

So I'm trying to create a comment system in which you can reply to comments that are already replies (allowing you to create theoretically infinite threads of replies). I want them to display in chronological order (newest on top), but of course the replies should be directly underneath the original comment. If there are multiple comments replying to the same comment, the replies should also be in chronological order (still underneath the original comment). I also want to limit the number of comment groups (a set of comments with a single comment that is not a reply at all) to, say, 25. How should I set up the MySQL table, and what sort of query would I use to extract what I want?

Here's a simplified version of my DB: ID int(11) NOT NULL AUTO_INCREMENT, DatePosted datetime NOT NULL, InReplyTo int(11) NOT NULL DEFAULT '0',

Sorry if this is kind of confusing, I'm not sure how to word it any differently. I've had this problem in the back of my mind for a couple months now, and every time I solve one problem, I end up with another...

+1  A: 

You should consider nesting your comments in a tree - I'm not that well familiar with data trees, but I can accomplish something relatively easy - I'm open to any suggestions (and explanations) for optimizing the code - but an idea would be something like this:

<?php

    $mysqli = new mysqli('localhost', 'root', '', 'test');  

    /** The class which holds the comments */
    class Comment
    {
        public $id, $parent, $content;
        public $childs = array();

        public function __construct($id, $parent, $content)
        {
            $this->id = $id;
            $this->parent = $parent;
            $this->content = $content;
        }

        public function addChild( Comment $obj )
        {
            $this->childs[] = $obj;
        }

    }


    /** Function to locate an object from it's id to help nest the comments in a hieraci */ 
    function locateObject( $id, $comments )
    {
        foreach($comments as $commentObject)
        {
            if($commentObject->id == $id)
                return $commentObject;

            if( count($commentObject->childs) > 0 )
                return locateObject($id, $commentObject->childs);

        }
    }

    /** Function to recursively show comments and their nested child comments */
    function showComments( $commentsArray )
    {
        foreach($commentsArray as $commentObj)
        {
            echo $commentObj->id;
            echo $commentObj->content;

            if( count($commentObj->childs) > 0 )
                showComments($commentObj->childs);
        }
    }

    /** SQL to select the comments and order dem by their parents and date */
    $sql = "SELECT * FROM comment ORDER BY parent, date ASC";
    $result = $mysqli->query($sql);

    $comments = array();

    /** A pretty self-explainatory loop (I hope) */
    while( $row = $result->fetch_assoc() )
    {

        $commentObj = new Comment($row["id"], $row["parent"], $row["content"]);

        if($row["parent"] == 0)
        {
            $comments[] = $commentObj;
            continue;
        }

        $tObj = locateObject($row["parent"], $comments);
        if( $tObj )
            $tObj->addChild( $commentObj );         
        else
            $comments[] = $commentObj;

    }



    /** And then showing the comments*/
    showComments($comments);


?>

I hope you get the general idea, and I'm certain that some of the other users here can provide with some experienced thoughts about my suggestion and helt optimize it.

Repox
Well, this certainly should work, and it's probably more optimized than anything I could right anyways. Thanks!
3000farad
A: 

In database, you may create a table with foreign key column (parent_comment), which references to comments table itself. For example:

CREATE TABLE comments (
  id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
  parent_comment INT FOREIGN KEY REFERENCES comments(id),
  date_posted  DATETIME,
  ...)

In order to show comments to single item, you'll have to select all comments for particular item, and parse them recursively in your script with depth-first algorithm. Chronological order should be taken into account in traversal algorithm.

Kel
A: 

I would consider a nested set, for storing this type of hierarchical data. See http://dev.mysql.com/tech-resources/articles/hierarchical-data.html for an example.

Björn
A: 

You might find this method helpful which involves a single call to a non-recursive stored procedure.

Full script can be found here : http://pastie.org/1259785

Hope it helps :)

Example stored procedure call:

call comments_hier(1);

Example php script:

<?php

$conn = new mysqli("localhost", "foo_dbo", "pass", "foo_db", 3306);

$result = $conn->query(sprintf("call comments_hier(%d)", 3));

while($row = $result->fetch_assoc()){
    ...
}

$result->close();
$conn->close();
?>

SQL script:

drop table if exists comments;
create table comments
(
comment_id int unsigned not null auto_increment primary key,
subject varchar(255) not null,
parent_comment_id int unsigned null,
key (parent_comment_id)
)engine = innodb;


insert into comments (subject, parent_comment_id) values
('Comment 1',null), 
   ('Comment 1-1',1), 
   ('Comment 1-2',1), 
      ('Comment 1-2-1',3), 
      ('Comment 1-2-2',3), 
        ('Comment 1-2-2-1',5), 
        ('Comment 1-2-2-2',5), 
           ('Comment 1-2-2-2-1',7);


delimiter ;

drop procedure if exists comments_hier;

delimiter #

create procedure comments_hier
(
in p_comment_id int unsigned
)
begin

declare v_done tinyint unsigned default 0;
declare v_depth smallint unsigned default 0;

create temporary table hier(
 parent_comment_id smallint unsigned, 
 comment_id smallint unsigned, 
 depth smallint unsigned default 0
)engine = memory;

insert into hier select parent_comment_id, comment_id, v_depth from comments where comment_id = p_comment_id;

/* http://dev.mysql.com/doc/refman/5.0/en/temporary-table-problems.html */

create temporary table tmp engine=memory select * from hier;

while not v_done do

    if exists( select 1 from comments c inner join hier on c.parent_comment_id = hier.comment_id and hier.depth = v_depth) then

        insert into hier 
            select c.parent_comment_id, c.comment_id, v_depth + 1 from comments c
            inner join tmp on c.parent_comment_id = tmp.comment_id and tmp.depth = v_depth;

        set v_depth = v_depth + 1;          

        truncate table tmp;
        insert into tmp select * from hier where depth = v_depth;

    else
        set v_done = 1;
    end if;

end while;

select 
 c.comment_id,
 c.subject,
 p.comment_id as parent_comment_id,
 p.subject as parent_subject,
 hier.depth
from 
 hier
inner join comments c on hier.comment_id = c.comment_id
left outer join comments p on hier.parent_comment_id = p.comment_id
order by
 hier.depth, hier.comment_id;

drop temporary table if exists hier;
drop temporary table if exists tmp;

end #

delimiter ;


call comments_hier(1);

call comments_hier(5);
f00
A: 

There are many ways. Here's one approach that I like (and use on a regular basis).

The database

Consider the following database structure:

CREATE TABLE comments (
  id int(11) unsigned NOT NULL auto_increment,
  parent_id int(11) unsigned default NULL,
  parent_path varchar(255) NOT NULL,

  comment_text varchar(255) NOT NULL,
  date_posted datetime NOT NULL,  

  PRIMARY KEY  (id)
);

your data will look like this:

+-----+-------------------------------------+--------------------------+---------------+
| id  | parent_id | parent_path             | comment_text             | date_posted   |
+-----+-------------------------------------+--------------------------+---------------+
|   1 | null      | /                       | I'm first                | 1288464193    | 
|   2 | 1         | /1/                     | 1st Reply to I'm First   | 1288464463    | 
|   3 | null      | /                       | Well I'm next            | 1288464331    | 
|   4 | null      | /                       | Oh yeah, well I'm 3rd    | 1288464361    | 
|   5 | 3         | /3/                     | reply to I'm next        | 1288464566    | 
|   6 | 2         | /1/2/                   | this is a 2nd level reply| 1288464193    | 

... and so on...

It's fairly easy to select everything in a useable way:

select id, parent_path, parent_id, comment_text, date_posted
from comments 
order by parent_path, date_posted;

ordering by parent_path, date_posted will usually produce results in the order you'll need them when you generate your page; but you'll want to be sure that you have an index on the comments table that'll properly support this -- otherwise the query works, but it's really, really inefficient:

create index comments_hier_idx on comments (parent_path, date_posted);

For any given single comment, it's easy to get that comment's entire tree of child-comments. Just add a where clause:

select id, parent_path, parent_id, comment_text, date_posted
from comments 
where parent_path like '/1/%'
order by parent_path, date_posted;

the added where clause will make use of the same index we already defined, so we're good to go.

Notice that we haven't used the parent_id yet. In fact, it's not strictly necessary. But I include it because it allows us to define a traditional foreign key to enforce referential integrity and to implement cascading deletes and updates if we want to. Foreign key constraints and cascading rules are only available in INNODB tables:

ALTER TABLE comments ENGINE=InnoDB;

ALTER TABLE comments 
  ADD FOREIGN KEY ( parent_id ) REFERENCES comments 
    ON DELETE CASCADE 
    ON UPDATE CASCADE;

Managing The Hierarchy

In order to use this approach, of course, you'll have to make sure you set the parent_path properly when you insert each comment. And if you move comments around (which would admittedly be a strange usecase), you'll have to make sure you manually update each parent_path of each comment that is subordinate to the moved comment. ... but those are both fairly easy things to keep up with.

If you really want to get fancy (and if your db supports it), you can write triggers to manage the parent_path transparently -- I'll leave this an exercise for the reader, but the basic idea is that insert and update triggers would fire before a new insert is committed. they would walk up the tree (using the parent_id foreign key relationship), and rebuild the value of the parent_path accordingly.

It's even possible to break the parent_path out into a separate table that is managed entirely by triggers on the comments table, with a few views or stored procedures to implement the various queries you need. Thus completely isolating your middle-tier code from the need to know or care about the mechanics of storing the hierarchy info.

Of course, none of the fancy stuff is required by any means -- it's usually quite sufficient to just drop the parent_path into the table, and write some code in your middle-tier to ensure that it gets managed properly along with all the other fields you already have to manage.


Imposing limits

MySQL (and some other databases) allows you to select "pages" of data using the the LIMIT clause:

SELECT * FROM mytable LIMIT 25 OFFSET 0;

Unfortunately, when dealing with hierarchical data like this, the LIMIT clause alone won't yield the desired results.

-- the following will NOT work as intended

select id, parent_path, parent_id, comment_text, date_posted
from comments 
order by parent_path, date_posted
LIMIT 25 OFFSET 0;

Instead, we need to so a separate select at the level where we want to impose the limit, then we join that back together with our "sub-tree" query to give the final desired results.

Something like this:

select 
  a.*
from 
  comments a join 
  (select id, parent_path 
    from comments 
    where parent_id is null
  order by parent_path, post_date DESC 
  limit 25 offset 0) roots
  on a.parent_path like concat(roots.parent_path,roots.id,'/%') or a.id=roots.id)
order by a.parent_path , post_date DESC;

Notice the statement limit 25 offset 0, buried in the middle of the inner select. This statement will retrieve the most recent 25 "root-level" comments.

[edit: you may find that you have to play with stuff a bit to get the ability to order and/or limit things exactly the way you like. this may include adding information within the hierarchy that's encoded in parent_path. for example: instead of /{id}/{id2}/{id3}/, you might decide to include the post_date as part of the parent_path: /{id}:{post_date}/{id2}:{post_date2}/{id3}:{post_date3}/. This would make it very easy to get the order and hierarchy you want, at the expense of having to populate the field up-front, and manage it as the data changes]

hope this helps. good luck!

Lee