views:

3609

answers:

9

I've seen SaaS applications hosted in many different ways. Is it a good idea to split features and modules across multiple databases? For example, putting things like the User table on one DB and feature/app specific tables on another DB and perhaps other commonly shared tables in another DB?

A: 

Ask yourself: What do you gain by moving everything into separate databases?

A lot of pain in terms of management would be my guess. I'd be more keen personally to have everything in a single database and if you hit issues that cannot be solved by a single database later then migrate the data into multiple databases.

marshall
What about performance and Scalability?
Vyrotek
A: 

Having a single database is best for data integrity because then you can use foreign keys. You can't have this built-in data integrity if you split the data into multiple databases. This isn't an issue if your data isn't related, but if it is related, it would be possible for your one database to contain data that is inconsistent with another database. In this case, you would need to write some code that scans your databases for inconsistent data on a regular basis so you can handle it appropriately.

However, multiple databases may be necessary if you need your site/application to be highly scalable (e.g. internet scale). For example, you could host each database on a different physical server.

Chris Gillum
+2  A: 

Splitting the database by features might not be a good idea unless you see strong evidence suggesting the need. Often you might need to update two databases as part of a single transactions - and distributed transactions are much more harder to work with. Furthermore, if the database needs to be split, you might be able to employ sharding.

binil
+5  A: 

High Scalability is a good blog for scaling SaaS applications. As mentioned, splitting tables across databases as you suggested is generally a bad idea. But a similar concept is sharding, where you keep the same (or similar) schema, but split the data on multiple servers. For example, users 1-5000 are on server1, and users 5000-10000 on server2. Depending on the queries your application uses, it can be an efficient way to scale.

Tom Ritter
+3  A: 

For SaaS applications, you use multiple databases for multiple tenants, but usually don't split it module-wise.

This is the most common model I have seen in SaaS application design. Your base schema is replicated for each tenant that you add to your application.

Vaibhav
+10  A: 

Start with one database. Split data/functionality when project requires it.

Here is what we can learn from LinkedIn:

  • A single database does not work
  • Referential integrity will not be possible
  • Any data loss is a problem
  • Caching is good even when it's modestly effective
  • Never underestimate growth trajectory

Source:

LinkedIn architecture

LinkedIn communication architecture

Sergey Kornilov
A: 

Keep it a natural design (denormalize as much as needed, normalize as less as required). Split the DB Model into its modules and keep the service oriented principles in mind by fronting data with a service (that owns the data).

Daniel Fisher lennybacon
+1  A: 

There are a variety of ways to accomplish it, but the issues of multi-tenancy go deeper than just the data model. I hate to be plugging product, but check out SaaSGrid by my the company I work at, Apprenda.We're a cloud operating system that allows you to write single tenant SOA apps (feel free to use NHibernate for data access) that automatically injects multi-tenancy into your app. When you publish your app, you can do things like choose a data model (isolated database or shared) and SaaSGrid will deploy accordingly and your app will run without any code changes - just write code as if it were for a single tenant!

A: 

Why to use database at all ?

I think it's good idea to use distributed storage systems like Hadoop, Voldemort (project-voldemort.com developed and used by LinkedIn).

I think db good for sensetive data like money operations , but for everything else you can use distributed storages.