views:

89

answers:

3

How would you design a hosted web application? I'm looking at applications like Basecamp, Campaign Monitor, Freshbooks, etc... where users can sign up online and the application is hosted for them.

  1. Would you use 1 big database to store all your customer's data or would you handle data differently? Would you use more than 1 database? Would you make a database for each customer?
  2. Would you duplicate your code base for each signup/customer or would you use 1 codebase to handle all customers?
  3. Are there other design elements I should think about?
  4. Any web sites or books out there that talk about this?

Edit: I found an MSDN article that discussed multi-tenant Data Architecture: http://msdn.microsoft.com/en-us/library/aa479086.aspx#mlttntda%5Ftopic4

+1  A: 

http://highscalability.com/

Matt
Thanks for the link!
metanaito
+2  A: 

Refer to 37signals -- they are experts in this field and have a lot of articles where they answer community questions (many like yours should come up).

High Scalability = 37signals Architecture

Ask 37signals: How do you process credit cards?

In regards to number of databases, from David Heinemeier Hansson in What do you want to know?

Some technical answers…

Lance, all our scheduled billing operations are automated. Anything sort of that would drive us insane. It’s especially important to make sure that contingency handling is in place for failing credit cards. Last I looked, I believe 5% of our charges bounced thanks to credit cards that were expired, over the limit, or closed. Be sure to handle that gracefully.

We just use Authorize.net and a separate credit card application (tiny app developed in Rails and used by the other apps on the internal network through REST ) that keeps numbers secure.

Warren, we run free and pay accounts on the same database. It’s one database per application. One database per account is normally a really, really bad idea. Usually the data is fairly normalized, but we’re definitely not religious about it. I generally value my source code over my schema. So if I can get better/prettier source code by bending a schema, I’ll typically do that. But start from normalized and denormalize as performance or code structure demands it.

Jason, we use email for sms. All US carriers have a [email protected] gateway.

Jake Good, ahh, the good ol’ “but does it scale” question. I answered that on a couple of years back. Nothing has changed for us since then. We manage millions and millions of dynamic requests every day without even resorting to much caching (most screens in most of our applications are different on a per-user basis, so traditional caching schemes are harder to apply).

There are many other Rails applications out there managing tens of millions of daily requests. All follow more or less the same Shared Nothing approach. All the techniques for scaling high and tall are out there. It’s hardly a turn-key solution, but anything that promises to be that is usually just full of it.

Abi Noda
Interesting. I just read an article that seems to say that isolated databases are preferred if you can afford it and they are easier to develop against. But a Shared database can handle more customers per server... but is more difficult to develop (I guess because of the queries and making sure a customer can't see other customer's data).
metanaito
See http://stackoverflow.com/questions/69128/saas-database-design-multiple-databases-split
Abi Noda
+2  A: 

If you're only talking about thousands of customers (vs hundreds of thousands or millions) then the difference is pretty minimal unless you know you have tables that might have thousands of rows per customer or more. Then your design might change.

Normal setup for a relational-database-based datastore is going to be putting a customer_id foreign key on most of your tables. Then just don't show that data to anyone but that customer (or in cases where they've somehow indicated explicit permissions are granted to someone else).

Don't worry too much about RDBMS scaling issues until it looks like you might start having multiple millions of rows in one table. Then it might be time to investigate a distributed key/value store. But keep in mind that that sort of problem is the good kind of problem to have, because presumably it means that you're making a ton of cash.

i.e., cross the scaling bridge when you come to it. Design things to the best of your current ability, but otherwise, premature optimization is the root of all evil.

Bob Aman