tags:

views:

51

answers:

2

Hi everyone I am new to mongo, this is my db design:

product := { 
    name: str
    group: ref,
    comments: [ ref, ref, ref, ref ] 
}

comments := { 
... a bunch of comments stuff

} 

tag := { 
    _id: int,               #Need this for online requests
    tag: str,
    products: [ {product: ref, score: float}, ... ],
    comments: [ {comment: ref, score: float}, ...],
}

So my usage pattern is: GIVEN a product, find comments that have certain tag and sort them accordingly.

My current approach involves:

  1. Look for that tag object that has tag=myTag
  2. pull all the comments out, sorted
  3. look for that product where product.name=myProduct
  4. pull all the comments out (which are dbrefs by the way)
  5. loop through the result of 2, and checking if they are in 4, (this I can do a limit 10) etc.

It's pretty inefficient. Any better methods?

A: 

I'm not sure if I understood what you're trying to do correctly, but if each comment could have multiple tags and is a comment on a single product, then you could make each comment have tag and product fields. Then your comment documents would look like:

comment := {
    product: product_id,
    tags: [tag1, tag2, ... ]
    ...
}

Then, given a product, you could do:

db.comments.find({product : productId, tags : myTag})
kristina
A: 

The reason the approach is so inefficient is that you've really designed your database to make this process inefficient.

You built the "tags" collection as a parent to the "comments" collection. But then you say that you want to load the "comments" by "tag".

Typically in tagging "comments" or "products" the "tag" belongs to the "comment" or "product". But you've reversed this, you're referencing comments from tags instead of looking up comment by tag.

What I think you're looking for is something more like this.

  • A Product contains Comments
  • A Product can be Tagged
  • A Comment can be Tagged
  • All Tags have a Score

Here's what that data structure looks like:

product := { 
    name: str,
    group: ref,
    tags: [ {ref, score}, {ref, score},... ]
    comments: [ { ref, tags: [ {ref, score}, {ref, score},... ] },
                { ref, tags: [ {ref, score}, {ref, score},... ] }, ... 
              ]
}

If you want to take this one step further, you can even remove the "comments" collection entirely. Comments without a Product, probably don't mean anything. So you can create the whole Comment "object" within the Product "object".

From an "indexing" perspective, you can index within the arrays. So you can set up an index on product.tags and product.comments.tags.

Now your query is much easier. You can literally just grab a Product and then loop through the Comments array looking for the appropriate Tag. Or you can run the query server-side and get it to order the Tags by score.

Gates VP