+2  A: 

You have two possible ways in which a user can follow another user; either directly, or indirectly through a group, in which case the user directly follows the group. Let's begin with storing these direct relations between users and groups:

{
  _id: "userA",
  followingUsers: [ "userB", "userC" ],
  followingGroups: [ "groupX", "groupY" ]
}

Now, you'll want to be able to quickly find out which users user A is following, either directly or indirectly. To achieve this, you can denormalize the groups that user A is following. Let's say that group X and Y are defined as follows:

{
  _id: "groupX",
  members: [ "userC", "userD" ]
},
{
  _id: "groupY",
  members: [ "userD", "userE" ]
}

Based on these groups, and the direct relations user A has, you can generate subscriptions between users. The origin(s) of a subscription are stored with each subscription. For the example data the subscriptions would look like this:

// abusing exclamation mark to indicate a direct relation
{ ownerId: "userA", userId: "userB", origins: [ "!" ] },
{ ownerId: "userA", userId: "userC", origins: [ "!", "groupX" ] },
{ ownerId: "userA", userId: "userD", origins: [ "groupX", "groupY" ] },
{ ownerId: "userA", userId: "userE", origins: [ "groupY" ] }

You can generate these subscriptions pretty easily, using a map-reduce-finalize call for an individual user. If a group is updated, you only have to re-run the map-reduce for all users that are following the group and the subscriptions will be up-to-date again.

Map-reduce

The following map-reduce functions will generate the subscriptions for a single user.

map = function () {
  ownerId = this._id;

  this.followingUsers.forEach(function (userId) {
    emit({ ownerId: ownerId, userId: userId } , { origins: [ "!" ] });
  });

  this.followingGroups.forEach(function (groupId) {
    group = db.groups.findOne({ _id: groupId });

    group.members.forEach(function (userId) {
      emit({ ownerId: ownerId, userId: userId } , { origins: [ group._id ] });
    });
  });
}

reduce = function (key, values) {
  origins = [];

  values.forEach(function (value) {
    origins = origins.concat(value.origins);
  });

  return { origins: origins };
}

finalize = function (key, value) {
  db.subscriptions.update(key, { $set: { origins: value.origins }}, true);
}

You can then run the map-reduce for a single user, by specifying a query, in this case for userA.

db.users.mapReduce(map, reduce, { finalize: finalize, query: { _id: "userA" }})

A few notes:

  • You should delete the previous subscriptions of a user, before running map-reduce for that user.
  • If you update a group, you should run map-reduce for all the users that follow the group.

I should note that these map-reduce functions turned out to be more complex than what I had in mind, because MongoDB doesn't support arrays as return values of reduce functions. In theory, the functions could be much simpler, but wouldn't be compatible with MongoDB. However, this more complex solution can be used to map-reduce the entire users collection in a single call, if you ever have to.

Niels van der Rest
This sounds like a good solution, thanks. The pagination problem is still there though: I can't use skip()/limit() with embedded documents. Basically as I said in the question, I need to list all the stuff an user is following (pretty much like Twitter does).
Brainfeeder
@Brainfeeder: You could store each subscription as a document in a separate collection to get around the skip/limit limitation. Then `"userA"` would be the `ownerId` for each of the subscriptions I mentioned, e.g. `{ ownerId: "userA", userId: "userB", origins: [ "!" ] }`.
Niels van der Rest
Exactly what I was thinking. Thanks a lot!
Brainfeeder
@Niels van der Rest: One thing: when should I run the Map/Reduce? Every time user A follows an user/group? Sorry, I'm still a bit confused about this as I'm new to NoSQL.
Brainfeeder
@Brainfeeder: Correct :)
Niels van der Rest
@Niels van der Rest: sorry to bother you, but what do you think about this related question I wrote a while ago? http://stackoverflow.com/questions/3838466/making-a-twitter-like-timeline-with-mongodb Thanks.
Brainfeeder
Also, one more thing. They say Map/Reduce shouldn't be used in realtime, because of it slowness. Could this be a problem for my app? Especially in the future, when the number of users, followers and groups will hopefully be pretty high?
Brainfeeder
@Brainfeeder: That's only the case for map-reduce on entire collections. But your map-reduce will only target a single user at a time. You're not map-reducing the entire `users` collection, but only a single document, so it shouldn't be slow. I'll update my answer with an example and have a look at your other question.
Niels van der Rest
@Brainfeeder: I've updated my answer, although it turned out to be a little more complex than what I had in mind. Regarding your other question: I agree with the posted answers. I've thought about that question before, but couldn't see any other efficient solution. In this question, map-reduce solves the need to differentiate between users and groups. But in the other question there is no such complexity, so using map-reduce wouldn't add any value :)
Niels van der Rest
Thanks for the massive answer, it all makes much more sense now. I was actually trying to write the MapReduce myself but I didn't even think I could use the finalize function to update the subscriptions collection, so... Anyway, I'm curious about why the functions are more complex than you initially thought. I'm just wondering how could they be simpler if MongoDB supported arrays as a return value? Sorry if I'm feeling kind of devoid of thoughts on this at the moment.
Brainfeeder
@Brainfeeder: You're welcome. In theory, the map function should just emit the `_id` of the followed user as the key, and the origin (direct or group) as the value. MongoDB then merges all these origins into a single array for each key, which is exactly what we need. So for user C the merged array would be `[ "!", "groupX" ]`. But you aren't allowed to simply return this array from the reduce function; you can only return a single value or object. But this returned value is then added to the array as well, which makes the final result incorrect.
Niels van der Rest
@Brainfeeder: To work around this, I had to introduce an object to hold the `origins` array and merge the values myself in the reduce function, using the `concat()` function.
Niels van der Rest
@Niels van der Rest: Finally that makes sense. I'm glad I asked my last question, I learned quite a lot thinking about it.
Brainfeeder
@Niels van der Rest: I just realized I also need to list all the groups an user is following. What do you think about storing relations between users and groups in another collection (thus duplicating the followingGroups embedded array)?
Brainfeeder
@Brainfeeder: I can't think of a good reason why you want to save the group relations in a separate collection just yet. A query on `groups` based on the IDs in the `followingGroups` array should be fast enough, in case you need additional group info, such as a name or description. If you see that this query is forming a bottleneck, then you should think about map-reducing it into a separate collection, or duplicate the required group properties directly to the `followingGroups` array.
Niels van der Rest
@Niels van der Rest: Yeah, I was a bit concerned about performance, but after all it's not like this structure is set in stone. I can always adapt it in the future if it turns out to be slow with a lot of users. So I'll go with what you suggested. Thanks again.
Brainfeeder