views:

680

answers:

2

What're the best practices to store nested sets (like trees of comments) in MongoDB?

I mean, every comment can have a parent comment and children-comments (answers).

Storing them like this:

{
   title: "Hello",
   body: "Please comment me!",
   comments: [
        {
            author: "Peter",
            text: "Hi there",
            answers: [
                  {
                      author: "Peter",
                      text: "Hi there",
                      answers: [
                                 { author: "Ivan", text: "Hi there" },
                                 { author: "Nicholas", text: "Hi there" }
                      ]
                  },
                  { author: "Ivan", text: "Hi there" },
                  { author: "Nicholas", text: "Hi there" },
            ]
        },
        { author: "Ivan", text: "Hi there" },
        { author: "Nicholas", text: "Hi there" },
   ]
}

is not cool, because we can't, for example, ask for "all post which are commented by Peter" without map/reduce (hm, can we?)

+2  A: 

I think there is no perfect solution - depends on what operations are more important for your app. I believe Silicon Alley Insider stores comments nested with MongoDB for example. That does make the query you mention harder.

One option is store at top-level in the post a list of all commenters in an array. Think of that as denormalized data. Then one can easily find all posts which involve a certain commenter. Then to drill down, you use map/reduce or db.eval() to get the nested post info within.

One other note - if you are dealing with a single document, db.eval() is probably lighter-weight than map/reduce. $where is also an option but can be slow so I like the additional 'list of commenters' mentioned above - not it is also easy to index that array too (see 'Multikey' in the docs).

See also: http://groups.google.com/group/mongodb-user/browse%5Fthread/thread/df8250573c91f75a/e880d9c57e343b52?lnk=gst&q=trees#e880d9c57e343b52

dm
A: 

In the link from dm's post Dwight Merriman mentions using a path key and doing regex matches

{
  path : "a.b.c.d.e.f"
}

Another way to do this would be with arrays

{
  path : ["a", "b", "c", "d", "e", "f"]
}

db.test.ensureIndex({path: 1})

that should make it pretty fast.

if each node can only be in a single path then you wouldn't need to do worry about where it is located in the list

db.test.find({path: "a"})

would find all children of "a"

Instead of path names I would probably use the _id of the nodes.

Pete Brumm