views:

403

answers:

2

A while back,a Digg developer had posted this blog ,"http://about.digg.com/blog/looking-future-cassandra", where the he described one of the issues that were not optimally solved in MySQL. This was cited as one of the reasons for their move to Cassandra.

I have been playing with MongoDB and I would like to understand how to

implement the MongoDB collections for this problem

From the article, the schema for this information in MySQL :

CREATE TABLE `Diggs` (
  `id`      INT(11),
  `itemid`  INT(11),
  `userid`  INT(11),
  `digdate` DATETIME,
  PRIMARY KEY (`id`),
  KEY `user`  (`userid`),
  KEY `item`  (`itemid`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

CREATE TABLE `Friends` (
  `id`           INT(10) AUTO_INCREMENT,
  `userid`       INT(10),
  `username`     VARCHAR(15),
  `friendid`     INT(10),
  `friendname`   VARCHAR(15),
  `mutual`       TINYINT(1),
  `date_created` DATETIME,
  PRIMARY KEY                (`id`),
  UNIQUE KEY `Friend_unique` (`userid`,`friendid`),
  KEY        `Friend_friend` (`friendid`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

This problem is ubiquitous in social networking scenario implementation. People befriend a lot of people and they in turn digg a lot of things. Quickly showing a user what his/her friends are up to is very critical.

I understand that several blogs have since then provided a pure RDBMs solution with indexes for this issue; however I am curious as to how this could be solved in MongoDB.

+1  A: 

One way to do this would be to add an array of "friends" to each post.

{
  date: Date(...)
  friends: ['me', 'you', 'thatguy']
  ...
}
db.posts.ensureIndex({friends:1, date:-1})

Then you could easily display my page by doing this: db.posts.find({friends:'me'}).sort({date:-1})

This will work as long as each user has less than about 200,000 friends; you may need to special-case posts from users with more than that. One way would be to just split the friend list into multiple chunks of 100,000 and make one post entry per chunk

mstearn
I am assuming that in your example "friends" is an array of references to other users .I am assuming that the users are stored in a users collection and the "friends" array is merely a list of reference to users in the users collection. So if I wanted to find all posts by my friends, then would I not have to do a "join" ? (I apologize that for using the terms such as joins etc. as I am coming from a SQL background)
Friends should be an array of whatever you use as the _id in your user objects. This can be either a string or an ObjectID; both have advantages and disadvantages.I wouldn't do a join or join-like operation. Instead, just store the current user's _id in a cookie or in your session object. Then just replace the string 'me' in my example query with the current user's _id and it will do what you want. By default, if you query and array field using a single value in the query, mongo will check if the query value is in the array.
mstearn
Thanks !! I would be trying this solution.
+1  A: 

There are many possible solutions with mongo. You can still store the diggs in a top-level table (a.k.a. collection) like a relational db, but additionally could store diggs as an array in the item collection or the user collection. Similarly the friend relation could be kept as an array in the user collection in either the forward or reverse direction.

Probably the most straight-forward approach would be an array of diggs in the items, and an array of friends in the users. Then a simple indexed query to retrieve a user's friends is followed by an "in" query on the indexed items.diggs.userid field.

Mongo's own documentation of the $in operator actually uses this example.

Coady
For Digg, the "in" query was the problem. The friends list came from MemCached and they did a "In" query which was a bottleneck. (From my experience with using 'in' SQL , it is not the optimized way of queries )From your experience , how has the $in operator performance been , with let say about 50 values to be used in "in" set.