views:

615

answers:

3

I want to represent threaded comments in Java. This would look similar to the way comments are threaded on reddit.com

hello
   hello
      hello
      hello
   hello
   hello
      hello

As in the example above, responses are nested in the HTML with appropriate indentation to reflect their relationship to prior comments.

What would be an efficient way to represent this in Java?

I'm thinking some kind of tree data structure would be appropriate.

But is there one in particular which would be most efficient to minimize tree traversals?

This would be important if I have voting on each comment. Because then the tree would need to be reordered after each vote - a potentially expensive operation computationally.

By the way, if anyone knows of an open source existing implementation of this in Java, that would help too.

A: 

This would be important if I have voting on each comment. Because then the tree would need to be reordered after each vote - a potentially expensive operation computationally.

Sounds like a premature optimization to me, possibly even a faulty optimization.

Your tree data structure sounds logical for representing your data. I say stick with it. Optimize it later only if a performance problem is detected and measured, and can be compared with alternatives.

Stu Thompson
Why faulty? Doesn't it make sense to try to start with the most efficient data structure when you can anticipate performance overhead?
Hula
Maybe the quote should be: "Premature optimization is evil, but so is choosing a stupid data structure" :-) [The word "stupid" is NOT in reference to anything Stu or Hula said, I'd just like to make that clear].
paxdiablo
*Possibly* faulty because you won't know what the most efficient data structure is for your use case until you try it out. (Plenty of folks attempt optimization only to have it be slower than simpler code.) Until then, use the structure that results in clearly readable code, which matches your ideas.
Stu Thompson
+3  A: 

The tree is right (with getLastSibling and getNextSibling), but if you're storing/querying the data, you probably want to store a lineage for each entry, or number by a preorder traversal:

http://www.sitepoint.com/article/hierarchical-data-database/2/

For loss of the exact number of subnodes, you can leave gaps to minimise renumbering. Still, I'm not certain that this will be noticeably faster than traversing the tree each time. I guess it depends how deep your tree grows.

See also:

http://stackoverflow.com/questions/38801/sql-how-to-store-and-navigate-hierarchies http://www.ibase.ru/devinfo/DBMSTrees/sqltrees.html (this scheme is also call a Celko tree)

Mark
Great link. Thanks.
Hula
+7  A: 

I would use levels of linked lists.

message1
    message2
        message3
        message4
    message5
    message6
        message7

Each node would have a pointer to its:

- forward sibling  (2->5, 3->4, 5->6,                   1/4/6/7->NULL).
- backward sibling (4->3, 5->2, 6->5,                   1/2/3/7->NULL).
- first child      (1->2, 2->3, 6->7,                   3/4/5/7->NULL).
- parent           (2->1, 3->2, 4->2, 5->1, 6->1, 7->6,       1->NULL).

Within each level, messages would be sorted in the list by vote count (or whatever other score you wanted to use).

That would give you maximum flexibility for moving things around and you could move whole sub-trees (e.g., message2) just by changing the links at the parent and that level.

For example, say message6 gets a influx of votes that makes it more popular than message5. The changes are (adjusting both the next and previous sibling pointers):

  • message2 -> message6
  • message6 -> message5
  • message5 -> NULL.

to get:

message1
    message2
        message3
        message4
    message6
        message7
    message5

If it continues until it garners more votes than message2, the following occurs:

  • message6 -> message2
  • message2 -> message5

AND the first-child pointer of message1 is set to message6 (it was message2), still relatively easy, to get:

message1
    message6
        message7
    message2
        message3
        message4
    message5

Re-ordering only needs to occur when a score change results in a message becoming more than its upper sibling or less than its lower sibling. You don't need to re-order after every score change.

paxdiablo
Wow! Thanks for taking the time explain this. I appreciate it.
Hula