views:

325

answers:

3

I'm thinking of creating a multi-tenant app using MongoDB. I don't have any guesses in terms of how many tenants I'd have yet, but I would like to be able to scale into the thousands.

I can think of three strategies:

  1. All tenants in the same collection, using tenant-specific fields for security
  2. 1 Collection per tenant in a single shared DB
  3. 1 Database per tenant

The voice in my head is suggesting that I go with option 2.

Thoughts and implications, anyone?

A: 

There is a reasonable article on MSDN about multi-tenant data architecture which you might wish to refer to. Some key topics touched on by this article:

  • Economic considerations
  • Security
  • Tenant considerations
  • Regulatory (legal)
  • Skill set concerns

Also touched upon are some patterns for Software as a Service (SaaS) configuration.

Additionally, worth a gander is an interesting write-up from the SQL Anywhere guys.

My own personal take - unless you are certain of enforced security / trust, I would go with option 3, or if scalability concerns prohibit fallback to option 2 at a minimum. That said... I'm no pro with MongoDB. I get pretty nervous using a shared "schema" - but I will happily defer to more experienced practitioners.

AJ
I'm familiar with that MSDN article, as my original plan was to use a relational database. My data is quite unstructured, however, which now has me investigating NoSQL dbs like MongoDB. It doesn't seem that MongoDB have ACL support the way Lotus Domino does, and I don't really want to reinvent the wheel, which makes me also think 2 or 3 are the way to go.I also don't know if there are limits that I may encounter in terms of # of collections or dbs allowed in MongoDB though.
Braintapper
+2  A: 

I found a good answer in the comments in this link:

http://blog.boxedice.com/2010/02/28/notes-from-a-production-mongodb-deployment/

Basically option #2 seems to be the best way to go.

Quote from David Mytton's comment:

We decided not to have a database per customer because of the way MongoDB allocates its data files. Each database uses it’s own set of files:

The first file for a database is dbname.0, then dbname.1, etc. dbname.0 will be 64MB, dbname.1 128MB, etc., up to 2GB. Once the files reach 2GB in size, each successive file is also 2GB.

Thus if the last datafile present is say, 1GB, that file might be 90% empty if it was recently reached.

from the manual.

As users sign up to the trial and give things a go, we’d get more and more databases that were at least 2GB in size, even if the whole of the data file wasn’t use. We found this used a massive amount of disk space compared to having several databases for all customers where the disk space can be used to maximum efficiency.

Sharding will be on a per collection basis as standard which presents a problem where the collection never reaches the minimum size to start sharding, as is the case for quite a few of ours (e.g. collections just storing user login details). However, we have requested that this will also be able to be done on a per database level. See http://jira.mongodb.org/browse/SHARDING-41

There are no performance tradeoffs using lots of collections. See http://www.mongodb.org/display/DOCS/Using+a+Large+Number+of+Collections

Braintapper
A: 

I would go for option 2.

However you could set mongod.exe command line option --smallfiles. This means that the biggest file size of an extent will be 0.5 gigabyte and not 2 gigabyte. I tested this with mongo 1.42 . So option 3 is not impossible.

TTT