views:

805

answers:

3

I've built a nice website system that caters to the needs of a small niche market. I've been selling these websites over the last year by deploying copies of the software using Capistrano to my web server.

It occurs to me that the only difference in these websites is the database, the CSS file, and a small set of images used for the individual client's graphic design.

Everything else is exactly the same, or should be... Now that I have about 20 of these sites deployed, it is getting to be a hassle to keep them all updated with the same code. And this problem will only get worse.

I am thinking that I should refactor this system, so that I can use one set of deployed ruby code, dynamically selecting the correct database, etc, by the URL of the incoming request.

It seems that there are two ways of handling the database:

  • using multiple databases, one for each client
  • using one database, with a client_id field in each table, and an extra 'client' table

The multiple database approach would be the simplest for me at the moment, since I wouldn't have to refactor every model in my application to add the client_id field to all CRUD operations.

However, it would be a hassle to have to run 'rake db:migrate' for tens or hundreds of different databases, every time I want to migrate the database(s). Obviously this could be done by a script, but it doesn't smell very good.

On the other hand, every client will have 20K-50K items in an 'items' table. I am worried about the speed of fulltext searches when the items table has a half million or million items in it. Even with an index on the client_id field, I suspect that searches would be faster if the items were separated into different client databases.

If anyone has an informed opinion on the best way to approach this problem, I would very much like to hear it. Thanks much in advance...

-- John

+1  A: 

There are advantages to using separate DBs (including those you already listed):

  • Fulltext searches will become slow (depending on your server's capabilities) when you have millions of large text blobs to search.
  • Separating the DBs will keep your table indexing speed quicker for each client. In particular, it might upset some of your earlier adopting clients if you take on a new, large client. Suddenly their applications will suffer for (to them) no apparent reason. Again, if you stay under your hardware's capacity, this might not be an issue.
  • If you ever drop a client, it'd be marginally cleaner to just pack up their DB than to remove all of their associated rows by client_id. And equally clean to restore them if they change their minds later.
  • If any clients ask for additional functionality that they are willing to pay for, you can fork their DB structure without modifying anyone else's.
  • For the pessimists: Less chance that you accidentally destroy all client data by a mistake rather than just one client's data. ;)

All that being said, the single DB solution is probably better given:

  • Your DB server's capabilities makes the large single table a non-issue.
  • Your client's databases are guaranteed to remain identical.
  • You aren't worried about being able to keep everyone's data compartmentalized for purposes of archiving/restoring or in case of disaster.
Adam Bellaire
+1  A: 

I would go for a single database, using client IDs - you should be able to make the refactoring less painful by using some form of base model, and a named scope to scope any actions to that client's ID.

You could use an indexing library, such as Ferret, or something along those lines, to deal with the issue of full-text searches becoming slow. That's going to be an issue anyway once a single client's database becomes to big, so you may need to implement that either way.

Jon Wood
+2  A: 

Thanks for the great comments. I have decided to go with the multiple database approach. This is the easiest path for me, since I don't have to rework the entire application.

What I'm going to do is to add a before_filter in application_controller, so it applies to all controllers... something like this:

before_filter :client_db         # switch to client's db
Then, in application_controller.rb, I'll include something like this:
 def client_db
    @client = Client.find(params[:client_id]) 
    spec = Client.configurations[RAILS_ENV] 
    new_spec = spec.clone 
    new_spec["database"] = @client.database_name
    ActiveRecord::Base.establish_connection(new_spec) 
  end

Then, a URL like example.com?client_id=12345 will select the correct database.

Since I am using Apache as a proxy in front of Mongrel, Apache will add the correct client_id to all requests, based on the client's website URL. So the client_id won't actually be part of the URL that users see. It will only be passed between Apache and Mongrel. I'm not sure if I'm explaining this properly, but it works and keeps things clean and simple.

If I decide I need to use a single database in the future, I can refactor all the code then. At the moment, this seems to be the simplest approach.

Anyone see any problem with this approach?

-- John

John
I'd route it differently, but that's just me. Like, ":user_id/:controller/:action/:id" in the routes.rb; this way you always have the client ID in the urls without something the client may delete (like the "client=" bit).
The Wicked Flea