First off, you should know the tradeoffs you are going to get with MongoDB and any other NoSQL database (but realize that I am a fan of it). If you are trying to normalize your data completely, you are making a big mistake. Even in relational databases, the larger your app gets, the more your data gets denormalized (see this post by Hot Potato). I've seen this time and time again. You should not go nuts and make a huge mess, but don't worry about repeating information in two places. One of the major points (in my opinion) of NoSQL is that your schema moves into your code and not solely into the database.
Now, to answer your question, I think your initial strategy is what I would do. MongoDB can place indexes on elements which are arrays, so that will make things a lot faster if you are looking for how many friendships a user has. But in reality, the only way to really be sure is to run some sort of test program that generates a database full of names and relationships.
You can script up some input in Python or Perl or whatever you like, and use a file of names to generate some relationships. Check out the Census website, which has a list of last names. Download the file dist.all.last
and write some program like:
#! /usr/bin/env python
import random as rand
f = open('dist.all.last')
names = []
for line in f:
names.append(line.split()[0])
rels = {}
for name in names:
numOfFriends = rand.randint(0, 1000)
rels[name] = []
for i in range(numOfFriends):
newFriend = rand.choice(names)
if newFriend != name: #cannot be friends with yourself
rels[name].append(newFriend)
# take relationships (i.e. rels) and write them to MongoDB
Also, as a general note, your fieldnames seem kind of long. Remember that the fieldnames are repeated with every document in that collection because you cannot rely on one field being in any other document. To save space, a general strategy is to use shorter fieldnames like "unam" instead of "username", but that's a small thing. See the great advice in these two posts.
EDIT:
Actually, in pondering your problem a little more, I would make one more suggestion: break up the subscription types into different fields to make the indexes more efficient. For example, instead of:
{
"username" : "alan",
"photo": "123.jpg",
"subscriptions" : [
{"username" : "john", "status" : "accepted"},
{"username" : "paul", "status" : "pending"}
]
}
As you said above, I would do this:
{
"username" : "alan",
"photo": "123.jpg",
"acc_subs" : [ "john" ],
"pnd_subs" : [ "paul" ]
}
So that you could have an index for each type of subscription, thus making queries like "Hoy many people have Paul as pending?" and "How many people subscribe to Paul?" super fast either way. Mongo's indexing over array'd values is truly an epic win.