views:

56

answers:

1

Links have one or more tags, so at first it might seem natural to embed the tags:

link = { title: 'How would you implement these queries efficiently in MongoDB?'
         url: 'http://stackoverflow.com/questions/3720972'
         tags: ['ruby', 'mongodb', 'database-schema', 'database-design', 'nosql']}

How would these queries be implemented efficiently?

  • Get links that contain one or more given tags (for searching links with given tags)
  • Get a list of all tags without repetition (for search box auto-completion)
  • Get the most popular tags (to display top 10 tags or a tag cloud)

The idea to represent the link as above is based on the MongoNY presentation, slide 38.

+2  A: 

Get links that contain "value" tag:

db.col.find({tags: "value"});

Get links that contain "val1", "val2" tags:

db.col.find({tags: { $all : [ "val1", "val2" ] }});

Get list of all tags without repetition:

db.col.distinct("tags");

Get the most popular tags - this isn't something that can be queried on an existing db, what you need to do is add a popularity field update it whenever a query fetches the document, and then do a query with the sort field set to the popularity.

Update: proposed solution for popularity feature. Try adding the following collection, let's call it tags.

doc = { tag: String, pop: Integer }

now once you do a query you collect all the tags that were shown (these can be aggregated and done asynchronously) so let's say you end up with the following tags: "tag1", "tag2", "tag3".

You then call the update method and increment the pop field value:

db.tags.update({tag: { $in: ["tag1", "tag2", "tag3"] }}, { $inc: { pop: 1 }});
Asaf
In order to add a popularity field for a tag, the tag would need to be added or moved into a separate collection, correct?
randomguy
you don't have to, you could keep it in the same collection and just use a dbref to point to the tag. a different collection will just make it simpler for you to manage your data (which is what I recommend).
Asaf
In the tags collection I'd suggest putting the tag name in the _id field rather than using a separate tag field. Also, if you don't mind doing one update per tag rather than using $in, you can make the query just {_id: "tag_name"} and use the upsert feature to create new tag entries.
mstearn
@Asaf, mstearn: How would you rename an existing tag?
randomguy
renaming would be difficult since your data isn't linked but rather duplicated. so you would either have to perform an update query on all the documents in your db swapping out the tag name with the new name or keep a different "id" for the tags stored in the link document and perform an extra query to get the tag name from the tags collection. the former is better when renaming rarely happens and the latter is better for a read intensive collection
Asaf