tags:

views:

108

answers:

2

Take facebook's private messaging system where you have to keep track of sender and receiver along w/ the message content. If I were using MySQL I would have multiple tables, but with MongoDB I'll try to avoid all that. I'm trying to come up with a "good" schema that can scale and is easy to maintain. If I were using mysql, I would have a separate table to reference the user and and message. See below ...

profiles table

user_id
first_name
last_name

message table

message_id
message_body
time_stamp

user_message_ref table

user_id (FK)
message_id (FK)
is_sender (boolean)

With the schema listed above, I can query for any messages that "Bob" may have regardless if he's the recipient or sender.

Now how to turn that into a schema that works with MongoDB. I'm thinking I'll have a separate collection to hold the messages. Problem is, how can I differentiate between the sender and the recipient? If Bob logs in, what do I query against? Depending on whether Bob initiated the email, I don't want to have to query against "sender" and "receiver" just to see if the message belongs to the user.

I hit up MongoDB's message group and came away with something that may work. Each message would be treated as a "blog" post. When a message is created, add the two users (doesn't matter who sender/receiver is initially) into an array. Each response after that would be treated as a comment, which would be inserted into an array.

MESSAGES

{
    "_id" : <objectID>,
    "users" : ["bob", "amy"],
    "user_msgs" :
        [
            { 
                "is_sender" : "bob",
                "msg_body" : "Hi Amy, how are you?!",
                "timestamp" : <generated by Mongo>
            }
            { 
                "is_sender" : "amy",
                "msg_body" : "Bob, long time no see, how is the family?!",
                "timestamp" : <generated by Mongo>
            }
        ]
}

This way I can query for messages that involves "Bob," and loop through the "user_msgs" array. I'll be able to tell who the sender is and sort by the timestamp.

A: 

You are going to need some kind of link between the two collections (users and messages).

Personally, I would keep it simple and add two extra fields to track the id of the sender and recipient, something like this:

{
    _id: /* whatever_id */,
    message_body: "This is the message",
    date_sent: 2010-04-20T10:35,
    sender_id: /*id_of_sender*/,
    recipient_id: /* id_of_recipient */
}

The sender_id and recipient_id fields would just hold value for the appropriate user (most likely some ObjectID instance, although you can assign whatever you like) which corresponds to the _id field for the appropriate entries in the users collection. You would be able to query these appropriately to grab the messages you are after (or count them, or whatever else).

Another approach might be to effectively do the same thing, but to use a formal DBRef for the sender and recipient rather than just putting their IDs in. This would probably work just as well but I'd tend to go with the previous solution just because it is simpler and probably easier to query.

Both solutions would need to do another round-trip to the DB to grab the appropriate user documents (for displaying the "from" and "to" names for example).


Edit:
It would appear I have misunderstood what you are trying to achieve - I didn't know Facebook messaging incorporated any concept of threading. However, the solution you have presented above looks sound. Personally, I'd stick in the IDs for the users rather than their names (alice & bob), but it looks pretty workable apart from that.

Splash
This would make "threading" a nightmare. No way I would be able to show this in a conversational format a la Facebook.
luckytaxi
Hmm, I'm not familiar with Facebook's messaging. I just assumed each message was its own entity, I didn't realise it had the concept of threading. Apologies if I have misunderstood.
Splash
hey no problem, I thought of your idea at first and it would make sense if I were creating another twitter. Thanks for the advice though!
luckytaxi
You're right, I would use the IDs rather than username, in case username changes for some reason.
luckytaxi
A: 

Figured it out. See my explanation above in original post.

luckytaxi