views:

2041

answers:

10

Hi everyone,

I am working on a PHP application that intends to ease company workflow and project management, let's say something like Basecamp and GoPlan.

I am not sure on what the best approach is, database-wise. Should I use a single database and add client-specific columns to each of the tables, or should I create a database for each new client? An important factor is automation: I want it to be dead simple to create a new client (and perhaps opening the possibility to signing up for yourself).

Possible cons I can think of using one database:

  • Lack of extensibility
  • Security problems (although bugs shouldn't be there in the first place)

What are your thoughts on this? Do you have any ideas what solution the above companies are most likely to have chosen?

+5  A: 

For multitenancy, performance will typically increase the more resources you manage to share across tenants, see

http://en.wikipedia.org/wiki/Multitenancy

So if you can, go with the single database. I agree that security problems would only occur due to bugs, as you can implement all access control in the application. In some databases, you can still use the database access control by careful use of views (so that each authenticated user gets a different view).

There are ways to provide extensibility also. For example, you could create a single table with extension attributes (keyed by tenant, base record, and extension attribute id). Or you can create per-tenant extension tables, so that each tenant has his own extension schema.

Martin v. Löwis
+11  A: 

I usually add ClientID to all tables and go with one database. But since the database is usually hard to scale I will also make it possible to run on different database instances for some or all clients.

That way you can have a bunch of small clients in one database and the big ones on separate servers.

A key factor for maintainability though, is that you keep the schema identical in all databases. There will be headache enough to manage the versioning without introducing client specific schemas.

idstam
Yeah, classic example of sharding. You can also move clients to different database for maintenance, etc. The key is to build the tools to move data around and an API to find what server an account is on.. Once that's done, the sky is the limit.
Gary Richardson
Thanks for your answer, I'll look into this solution. It seems my the best solution in this case will be using one 'generic' database, and setup a custom database for client-specific modules.
Aron Rotteveel
+4  A: 

In my view, it will depend on your likely customer base. If you could get into a situation where arch-rivals are both using your system, then you would be better off with separate databases. It also depends on how multiple databases get implemented by your DBMS. If each database has a separate copy of the infrastructure, then that suggests a single database (or a change of DBMS). If multiple databases can be served by a single copy of the infrastructure, then I'd go for separate databases.

Think of database backup. Customer A says "Please send me a copy of my data". Much, much easier in a separate database setup than if a single database is shared. Think of removing a customer; again, much easier with separate databases.

(The 'infrastructure' part is mealy-mouthed because there are major differences between different DBMS about what constitutes a 'database' versus a 'server instance', for example. Add: The question is tagged 'mysql', so maybe those thoughts aren't completely relevant.)

Add: One more issue - with multiple customers in a single database, every SQL query is going to need to ensure that the data for the correct customer is chosen. That means that the SQL is going to be harder to write, and read, and the DBMS is going to have to work harder on processing the data, and indexes will be bigger, and ... I really would go with a separate database per customer for many purposes.

Clearly, StackOverflow (as an example) does not have a separate database per user; we all use the same database. But if you were running accounting systems for different companies, I don't think it would be acceptable (to the companies, and possibly not to the legal people) to share databases.

Jonathan Leffler
+3  A: 

I had the same question. Here are some of the answers I got. http://stackoverflow.com/questions/69128/saas-database-design-multiple-databases-split

Check out the slides on LinkedIn's architecture

Vyrotek
Thanks for your answer, this is a nice resource to look into!
Aron Rotteveel
+5  A: 

Listen to the Stackoverflow podcast where Joel and Jeff talk about the very same question. Joel is talking about their experience offering a hosted version of their software. He points out that adding client ids all over your DB complicates the design and code (are you sure you didn't accidentally forget to add it to some WHERE clause?) and complicates hosting feature, such as client-specific backups.

It was in episode #20 or #21 (check the transcripts for details).

Philipp Schmid
it's episode #19 @ [50:45] => https://stackoverflow.fogbugz.com/default.asp?W24218
vitorsilva
+2  A: 

Another point to consider is that you may have a legal obligation to keep one companies' data separate from anothers'.

da5id
Thanks for your answer; this a valid point to take into consideration
Aron Rotteveel
+1  A: 

Having a database per client generally does not scale well. MySQL (and probably other databases) holds resources open per table, this does not lend itself well to 10k+ tables on one instance, which would happen in a large-scale multitenancy situation.

Of course, if you have some other issue which causes other problems before you get to this level, this may not be relevant.

Additionally, "sharding" a multi-tenant application is likely€ to be the right thing to do eventually as your application gets bigger and bigger.

Sharding does not however mean one database (or instance) per tenant, but one per shard or set of shards, which may have several tenants each. You will need to discover the right tuning parameters for yourself, probably in production (hence it probably needs to be pretty tunable from the outset)

€ I can't guarantee it.

MarkR
A: 

You can start with a single database and partition it as the application grows. If you do this, there a few things I would recommend:

1) Design the database in a way that it can be easily partitioned. For example, if customers are going to share data, make sure that data is easily replicated across each database.

2) When you have only one database, make sure it is being backed up to another physical server. In the event of a failover you can revert traffic to this other server and still have your data intact.

jjriv
+4  A: 
  • DEVELOPMENT For rapid development, use a database per customer. Think how easy it will be to backup, restore, or delete a customer's data. Or to measure/monitor/bill usage. You won't need to write code to do it by yourself, just use your database primitives.

  • PERFORMANCE For performance, use a database for all. Think about connection pooling, shared memory, caching, etc.

  • BUSINESS If your business plan is to have lots of small customers (think hotmail) you should probably work on a single DB. And have all administrative tasks such registration, deletion, data migration, etc. fully automated and exposed in a friendly interface. If you plan to have dozens or up to a few hundreds of big customers then you can work in one DB per customer and have system administration scripts in place that can be operated by your customer support staff.

flybywire
+1  A: 

The following screencast explains how it's done on salesforce.com. They use one database with a special column OrgId which identifies each tenant's data. There's much more to that so you should look into this. I'd go with their approach.

There's another great article about that on MSDN. It explains in depth when you should use a shared or isolated approach. Remember that having a shared DB for all your tenants has some important security implications and if all of them share same DB objects you might want to use [row level security] - depending on the DBMS you use (I'm sure it's possible in MS SQL Server and Oracle, probably in IBM DB2 also). You can use tricks like row level security in mySQL to achieve similar results (views + triggers).

brainnovative