views:

200

answers:

4

In StackOverflow podcast no. 19, Joe describe Fogcreek's decision to have one database PER client instead of one database for ALL clients. That kinda sets me thinking about the following.

  1. Assuming I have 1000 users.
  2. Each user has 100 clients.
  3. Each client has 1000 products.

So that means I'm gonna have 1000 x 100 x 1000 = 100,000,000 products associated with users. Now if I do a join tables query for a user and all his client's products, what should be a reasonable amount of query time if I use just a single database for this purpose?

UPDATE

Maybe I wasn't clear enough in my question. Assume I need to do all sorts of funky queries (min, max, group, etc.) with the datasets as describe from above, would it be slow (or not) to the point that it makes better sense to have multiple database strategy eg. 1 DB/client, database sharding, etc.

A: 

You should have user and/or client as part of the PK in most cases, so your performance in the multi-tenant scenario should not be impacted much differently from the single-tenant case.

Cade Roux
A: 

If you want to get all of that, all the columns and rows, without filtering or aggregation, then you'll have to wait for a very long time. I don't think there is any reasonable amount of time that you can use as a benchmark here. You just have to wait :)

But those types of queries rarely happen (unless you are migrating your data - still there alternatives). What usually happens is you return a subset rows, or a page of rows, a selection of columns, or aggregates. In this case, the "reasonable amount" depends on the type of query and the computations involved.

jop
+1  A: 

I imagine the answer depends on your choice of DBMS. With Oracle, for example, 1 big database would definitely be preferable, in fact 1000 identical databases would be condsidered absurd and unmanageable.

Also, would you never have a need to run queries across users? e.g. find the user with the most products. Or are these really 1000 discrete "private" databases and no one has overall access to the data? Even then, Oracle for example offers "Virtual Private Database" to cater for that in a single database.

Tony Andrews
+1  A: 

The main reasons for a one-database-per-client strategy are for security and manageability. While the concept of backup/restore on a single database rather than 100 client specific DB's does get you a win, it has some downsides. Some of the issues with a shared database are:

  • Users cannot directly report off the database without some additional security measure (such as views) to prevent them seeing each others' data. In the case of sensitive data this also becomes a compliance issue.

  • The entire application has to be aware of the security model, which adds some degree of complexity. Again, with sensitive data this has compliance implications.

  • System maintenance or support ticket tasks that involve one customer's data have a risk that an error will affect others' data.

The inherent compartmentalisation of multiple databases simplifies the application and eliminates entire classes of error, misconfiguration and security issues. If you build the application so it is amenable to automated configuration management and deployment to multiple environments you can get to something that is fairly easy to maintain even on multiple instances.

Additionally, with the data volumes and types of reporting you are describing, you might be better off building some sort of reporting subsystem or data mart that moves the reporting off the production server. Analytic reports of that sort are much more efficient on star schemas than the type of normalised schema you would use on a transactional database.

ConcernedOfTunbridgeWells