tags:

views:

37

answers:

2

Let's take a simple example, a blog post. I would store comments to a particular post within the same document.

messages = { '_id' : ObjectId("4cc179886c0d49bf9424fc74"),
             'title' : 'Hello world',
             'comments' : [ { 'user_id' : ObjectId("4cc179886c0d49bf9424fc74"),
                              'comment' : 'hello to you too!'},
                            { 'user_id' : ObjectId("4cc1a1830a96c68cc67ef14d"),
                              'comment' : 'test!!!'},  
                          ]
           }

The question is, would it make sense to store the username instead of the user's objectid aka primary key? There are pros/cons to both, pro being that if I display the username within the comment, I wouldn't have to run a second query. Con being if "John Doe" decides to modify his username, I would need to run a query across my entire collection to change his username within all comments/posts.

What's more efficient?

A: 

Of course, it really depends on how much traffic you're going to get, how many comments you expect to have, etc… But it's likely that “do the simplest thing that works” is your friend here: it's simpler to store only the user_id, so do that until it doesn't work any more (eg, because you've got a post with 100,000 comments that takes 30 seconds to render), then denormalize and store the username along with the comments.

David Wolever
A: 

I will store the two fields. This way, you only run one query in the most common case (display the comments). Change user name is really rare so you will not have to update very often.

I will keep user_id because I don't like to use natural field like username as primary key and match on an object id must be faster.

Maxence