I'm starting a MongoDB project just for kicks and as a chance to learn MongoDB/NoSQL schemas. It'll be a live chat app and the stack includes: Rails 3, Ruby 1.9.2, Devise, Mongoid/MongoDB, CarrierWave, Redis, JQuery.
I'll be handling the live chat polling/message queueing separately. Not sure how yet, either Node.js, APE or custom EventMachine app. But in regards to Mongo, I'm thinking to use it for everything else in the app, specifically chat logs and historical transcripts.
My question is how best to design the schema as all my previous experience has been with MySQL and relational DB schema's. And as a sub-question, when is it best to us embedded documents vs related documents.
The app will have:
- Multiple accounts which have multiple rooms
- Multiple rooms
- Multiple users per room
- List of rooms a user is allowed to be in
- Multiple user chats per room
- Searchable chat logs on a per room and per user basis
- Optional file attachment for a given chat
Given Mongo (at least last time I checked) has a document limit of 4MB, I don't think having a collection for rooms and storing all room chats as embedded documents would work out so well.
From what I've thought about so far, I'm thinking of doing something like:
- A collection for accounts
- A collection for rooms
- Each room relates back to an account
- Related documents in chats collections for all chat messages in the room
- Embedded Document listing all users currently in the room
- A collection for users
- Embedded Document listing all the rooms the user is currently in
- Embedded Document listing all the rooms the user is allowed to be in
- A collection for chats
- Each chat relates back to a room in the rooms collection
- Each chat relates back to a user in the users collection
- Embedded document with info about optional uploaded file attachment.
My main concern is how far do I go until this ends up looking like a relational schema and I defeat the purpose? There is definitely more relating than embedding going on.
Another concern is that referencing related documents is much slower than accessing embedded documents I've heard.
I want to make generic queries such as:
- Give me all rooms for an account
- Give me all chats in a room (or filtered via date range)
- Give me all chats from a specific user
- Give me all uploaded files in a given room or for a given org
- etc
Any suggestions on how to structure the schema efficiently in a way that scales? Thanks everyone.