views:

596

answers:

4

I'm starting a project which I think will be particularly suited to MongoDB due to the speed and scalability it affords.

The module I'm currently interested in is to do with real-time chat. If I was to do this in a traditional RDBMS I'd split it out into:

  • Channel (A channel has many users)
  • User (A user has one channel but many messages)
  • Message (A message has a user)

The the purpose of this use case, I'd like to assume that there will be typically 5 channels active at one time, each handling at most 5 messages per second.

Specific queries that need to be fast:

  • Fetch new messages (based on an bookmark, time stamp maybe, or an incrementing counter?)
  • Post a message to a channel
  • Verify that a user can post in a channel

Bearing in mind that the document limit with MongoDB is 4mb, how would you go about designing the schema? What would yours look like? Are there any gotchas I should watch out for?

+1  A: 

Why use mongo for a messaging system? No matter how fast the static store is (and mongo is very fast), whether mongo or db, to mimic a message queue your going to have to use some kind of polling, which is not very scalable or efficient. Granted you're not doing anything terribly intense, but why not just use the right tool for the right job? Use a messaging system like Rabbit or ActiveMQ.

If you must use mongo (maybe you just want to play around with it and this project is a good chance to do that?) I imagine you'll have a collection for users (where each user object has a list of the queues that user listens to). For messages, you could have a collection for each queue, but then you'd have to poll each queue you're interested in for messages. Better would be to have a single collection as a queue, as it's easy in mongo to do "in" queries on a single collection, so it'd be easy to do things like "get all messages newer than X in any queues where queue.name in list [a,b,c]".

You might also consider setting up your collection as a mongo capped collection, which just means that you tell mongo when you set up the collection that your collection should only hold X number of bytes, or X number of items. Adding additional items has Last-In, First-Out behavior which is pretty much ideal for a messgae queue. But again, it's not really a messaging system.

Steve B.
Klinky
There are decent MQ solutions out there, I just find they're the ones without much in the way of features, ZeroMQ and Kestrel are both good for their purposes. ActiveMQ on the other hand is horrific.
Michael
@Klinky I bet almost any specific MQ solution (especially ActiveMQ) would deal with the messaging (EDA) problem times better, than a custom solution based on a NoSQL of an unspecified type (did you mean a document-oriented DB, or key-value store or what?), because MQ solutions are designed for that problem, and, FTN ActiveMQ uses it's own optimized high-performance data storage for queue persistence.
Vasil Remeniuk
@Steve B. "...,which is not very scalable or efficient" -- don't agree on "scalable" (though agree on efficiency and performance). Why? Opposed to storing queues in memory (which leads to problems, if you have 1+ node in your cluster -- you either need to setup replication or build a network of brokers), making multiple consumers work on a persisted queue seem to be less problematic (especially, considering failure scenarios).
Vasil Remeniuk
Klinky
@Klinky Twitter developers have done a lot of weird stuff, you know :) (if you had a chance to read a book about Scala by one of the Twitter's lead architects, you may guess, how "good" is their MQ solution). Regarding ActiveMQ - personally I've had an extremely good experience with it (I was using it to build a merely huge distributed mass mailing system). ~30-60k/sec throughput is a basic setup with one broker - if you build a network of brokers, performance could be times higher.
Vasil Remeniuk
@Vasil, to each his own I guess. I just found NoSQL more straightforward to get started with. I understand what a queue is and that I want to put stuff on it and take stuff off. Something like Redis makes this super easy to do. As far as Redis performance, I can push on to a queue about 35K msgs/sec. Potentially retrieve from the queue at up to 400K msgs/ sec. Tested on my Celeron E3200 1MB L2 @ 3.8Ghz overclock, inside Ubuntu Virtualbox w/ IntelVT enabled. Redis is not multi-threaded so this is only using 1 of 2 cores. I guess it depends on what you need your 'MQ' to do.
Klinky
>>> I can push on to a queue about 35K msgs/sec. Potentially retrieve from the queue at up to 400K msgs/ sec.<<<Hehe) Sounds interesting and promisng. I didn't have much chance to get the hands dirty with Redis, and would be happy to glance over a good architecture that uses it -- is your MQ solution a part of a proprietary system, or it's open?
Vasil Remeniuk
A: 

I used Redis, NGINX & PHP-FPM for my chat project. Not super elegant, but it does the trick. There are a few pieces to the puzzle.

  1. There is a very simple PHP script that receives client commands and puts them in one massive LIST. It also checks all room LISTs and the users private LIST to see if there are messages it must deliver. This is polled by a client written in jQuery & it's done every few seconds.

  2. There is a command line PHP script that operates server side in an infinite loop, 20 times per second, which checks this list and then processes these commands. The script handles who is in what room and permissions in the scripts memory, this info is not stored in Redis.

  3. Redis has a LIST for each room & a LIST for each user which operates as a private queue. It also has multiple counters for each room the user is in. If the users counter is less than the total messages in the room, then it gets the difference and sends it to the user.

I haven't been able to stress test this solution, but at least from my basic benchmarking it could probably handle many thousands of messages per second. There is also the opportunity to port this over to something like Node.js to increase performance. Redis is also maturing and has some interesting features like Pub/Subscribe commands, which might be of interest, that would possibly remove the polling on the server side possibly.

I looked into Comet based solutions, but many of them were complicated, poorly documented or would require me learning an entirely new language(e.g. Jetty->Java, APE->C),etc... Also delivery and going through proxies can sometimes be an issue with Comet. So that is why I've stuck with polling.

I imagine you could do something similar with MongoDB. A collection per room, a collection per user & then a collection which maintains counters. You'll still need to write a back-end daemon or script to handle manging where these messages go. You could also use MongoDB's "limited collections", which keeps the documents sorted & also automatically clears old messages out, but that could be complicated in maintaining proper counters.

Klinky
A: 

i want to use mongodb for a social networking iphone app. one query will be to return people nearby you with the same interests (e.g. hockey). I know mongodb can do that: http://www.mongodb.org/display/DOCS/Geospatial+Indexing but i also want to have user-to-user real-time chat & chat rooms for every interest. is mongodb good for chat applications? any open source examples? i.e. how do you build a chat application with mongodb?

MattDiPasquale
A: 

1) ape-project.org

2) http://code.google.com/p/redis/

3) after you're through all this - you can dumb data into mongodb for logging and store consistent data (users, channels) as well

Tobias