views:

1059

answers:

3

Our software currently runs on MySQL. The data of all tenants is stored in the same schema. Since we are using Ruby on Rails we can easily determine which data belongs to which tenant. However there are some companies of course who fear that their data might be compromised, so we are evaluating other solutions.

So far I have seen three options:

  • Multi-Database (each tenant gets its own - nearly the same as 1 server per customer)
  • Multi-Schema (not available in MySQL, each tenant gets its own schema in a shared database)
  • Shared Schema (our current approach, maybe with additional identifying record on each column)

Multi-Schema is my favourite (considering costs). However creating a new account and doing migrations seems to be quite painful, because I would have to iterate over all schemas and change their tables/columns/definitions.

Q: Multi-Schema seems to be designed to have slightly different tables for each tenant - I don't want this. Is there any RDBMS which allows me to use a multi-schema multi-tenant solution, where the table structure is shared between all tenants?

P.S. By multi I mean something like ultra-multi (10.000+ tenants).

+2  A: 

However there are some companies of course who fear that their data might be compromised, so we are evaluating other solutions.

This is unfortunate, as customers sometimes suffer from a misconception that only physical isolation can offer enough security.

There is an interesting MSDN article, titled Multi-Tenant Data Architecture, which you may want to check. This is how the authors addressed the misconception towards the shared approach:

A common misconception holds that only physical isolation can provide an appropriate level of security. In fact, data stored using a shared approach can also provide strong data safety, but requires the use of more sophisticated design patterns.

As for technical and business considerations, the article makes a brief analysis on where a certain approach might be more appropriate than another:

The number, nature, and needs of the tenants you expect to serve all affect your data architecture decision in different ways. Some of the following questions may bias you toward a more isolated approach, while others may bias you toward a more shared approach.

  • How many prospective tenants do you expect to target? You may be nowhere near being able to estimate prospective use with authority, but think in terms of orders of magnitude: are you building an application for hundreds of tenants? Thousands? Tens of thousands? More? The larger you expect your tenant base to be, the more likely you will want to consider a more shared approach.

  • How much storage space do you expect the average tenant's data to occupy? If you expect some or all tenants to store very large amounts of data, the separate-database approach is probably best. (Indeed, data storage requirements may force you to adopt a separate-database model anyway. If so, it will be much easier to design the application that way from the beginning than to move to a separate-database approach later on.)

  • How many concurrent end users do you expect the average tenant to support? The larger the number, the more appropriate a more isolated approach will be to meet end-user requirements.

  • Do you expect to offer any per-tenant value-added services, such as per-tenant backup and restore capability? Such services are easier to offer through a more isolated approach.


UPDATE: Further to update about the expected number of tenants.

That expected number of tenants (10k) should exclude the multi-database approach, for most, if not all scenarios. I don't think you'll fancy the idea of maintaining 10,000 database instances, and having to create hundreds of new ones every day.

From that parameter alone, it looks like the shared-database, single-schema approach is the most suitable. The fact that you'll be storing just about 50Mb per tenant, and that there will be no per-tenant add-ons, makes this approach even more appropriate.

The MSDN article cited above mentions three security patterns that tackle security considerations for the shared-database approach:

When you are confident with your application's data safety measures, you would be able to offer your clients a Service Level Agrement that provides strong data safety guarantees. In your SLA, apart from the guarantees, you could also describe the measures that you would be taking to ensure that data is not compromised.

Daniel Vassallo
Oh, I scanned that article yesterday and skipped that misconception-part. Need to read it again.
Marcel J.
@Marcel: However, apart from what the customers' perception of security is, I believe your decision on which multi-tenant approach to take should be based on factors like those 4 points I quoted from the MSDN article: 1. Expected number of tenants. - 2. Expected storage requirement for each tenant. - 3. Expected number of concurrent end-users. - 4. Expected per-tenant addons.
Daniel Vassallo
Thanks for pointing out that section. Number = 10k, Storage = 50mb, Concurrent End-Users = 2 per tenant, Addons = 0. So the current situation having a shared approach seems to be the most reasonable. I think I'll do some calls next week to find out what customers really need / expect. Germany and data/IT-security is a really tough story.
Marcel J.
+3  A: 

My experience (albeit SQL Server) is that multi-database is the way to go, where each client has their own database. So although I have no mySQL or Ruby On Rails experience, I'm hoping my input might add some value.

The reasons why include :

  1. data security/disaster recovery. Each companies data is stored entirely separately from others giving reduced risk of data being compromised (thinking things like if you introduce a code bug that means something mistakenly looks at other client data when it shouldn't), minimizes potential loss to one client if one particular database gets corrupted etc. The perceived security benefits to the client are even greater (added bonus side effect!)
  2. scalability. Essentially you'd be partitioning your data out to enable greater scalability - e.g. databases can be put on to different disks, you could bring multiple database servers online and move databases around easier to spread the load.
  3. performance tuning. Suppose you have one very large client and one very small. Usage patterns, data volumes etc. can vary wildly. You can tune/optimise easier for each client should you need to.

I hope this does offer some useful input! There are more reasons, but my mind went blank. If it kicks back in, I'll update :)

EDIT:
Since I posted this answer, it's now clear that we're talking 10,000+ tenants. My experience is in hundreds of large scale databases - I don't think 10,000 separate databases is going to be too manageable for your scenario, so I'm now not favouring the multi-db approach for your scenario. Especially as it's now clear you're talking small data volumes for each tenant!

Keeping my answer here as anyway as it may have some use for other people in a similar boat (with fewer tenants)

AdaTheDev
Yeah, sorry that I didn't clarify that earlier. Still +1. ;)
Marcel J.
+1  A: 

Below is a link to a white-paper on Salesforce.com about how they implement multi-tenancy:

http://www.salesforce.com/au/assets/pdf/Force.com_Multitenancy_WP_101508.pdf

They have 1 huge table w/ 500 string columns (Value0, Value1, ... Value500). Dates and Numbers are stored as strings in a format such that they can be converted to their native types at the database level. There are meta data tables that define the shape of the data model which can be unique per tenant. There are additional tables for indexing, relationships, unique values etc.

Why the hassle?

Each tenant can customize their own data schema at run-time without having to make changes at the database level (alter table etc). This is definitely the hard way to do something like this but is very flexible.

dana