views:

52

answers:

3

I am working with a half dozen DBs. The DBs all have the same schemas, the same SPs, etc. Speaking to the person who originally designed the DBs, a big part of the motivation for using many DBs was efficiency; the alternative would be to add a column to pretty much every table and sp in the database indicating which set of data was being worked in, resulting in one giant (and thus slower) DB instead of several samll DBs. In place of having a column to indicate which set of data is being queried, the connection string is used to select which database is being hit.

The only reason I really dislike this organization is that it involves a lot of code duplication and thus hurts maintenance. For example, every time I wish to change a stored procedure, I need to run the alter statement on every database.

One solution I have considered is to combine all of the data into one big database, adding an extra column all over the place to indicate which database the data would be in if I had not combined it. Then, I could partition all of the tables by this column's value. In theory, the result of all of this is that the underlying representation of all of the data itself will be morally the same as it is now, but without the redundancies in the indexes, schemas, SPs, etc.

My questions are this:

  1. Is this a good idea? Is there a better way to accomplish this?
  2. Are there any gotchas in doing this?
  3. Will this have any impact on performance?
+3  A: 

Everyone will deal with this at some point. My own personal opinion is that multiple databases are a pain in the backside and are not faster. They are a pain because of the maintenance headaches. Adding an extra column in each table as necessary will not slow your process done that much, if indexing is set properly. And your maintenance will be much easier. Plus, doing transactions across multiple DB's can be a hassle and involve MTC.

BTW, using a single database is often called a multi-tenant database. You might want to research this a bit. But I would avoid multiple DB's like this if possible.

Randy Minder
The term, "Multi-tenant" is what I needed in order to research this.
Brian
+1  A: 

I'm of a different mind than Randy.

The multi-tenant model has its advantages.

For one, maintenance is not really much different whether you have 5 databases or 500. At some point you stop looking at maintenance of individual databases and look at the set. Yes you must serialize backups and you can't be performing index reorg/rebuild across all databases at once.

But for code changes across multiple more-or-less identical databases, there are easy ways to script a lot of things to be done to multiple databases without really lifting an extra finger. I use a tool called SQLFarms Combine (now sold by JNetDirect), but there are other offerings such as RedGate MultiScript that I haven't played with.

What I like most about the multi-tenant model is that when you grow and scale and suddenly need a new database server, it is very easy to move one of the tenants (say, the busiest or fastest growing) to the new server. If everybody is jammed into the same database, this extraction of only their data becomes quite difficult, especially if there is to be minimized downtime. In the multi-tenant model, you can set up mirroring for just their database, and then switch the primary when you're ready.

Aaron Bertrand
Because of the way I am using this database, downtime is not really that big of a deal.
Brian
A: 

I'd be in favor of combining these databases. There are other facilities built into SQL Server to account for the potential performance downfalls of a very large database, like additional indexing on a second physical disk, partitioning, clustering, etc. The headache and overhead involved in deploying schema updates to that many different databases can be time consuming when it's easily handled in a single database. I think SQL Server scales really well in cases like this - let the database server do what it's designed to do and provide responsive access to your data. You can focus on application design and leave the storage model to SQL Server.

Also, though this isn't mentioned above, I'd suspect that there's some level of dynamic SQL involved in the applications that use this "many database" model because you've got to switch between databases based on something you know, so it can't be hard coded into the application or in a configuration file, meaning that either connection strings or actual SQL statements have to be generated on the fly, and that can be a really big security risk (read about "SQL Injection" if you're unfamiliar with the potential risks of dynamic SQL).

rwmnau
Deploying schema updates to 10 databases is really no different than deploying them to 1. You just have to use the right tool.As for dynamic SQL, no, not if you design it right. I have a 500+ db multi-tenant model and the only dynamic SQL is either isolated to a database (because it has nothing to do with tenancy) or involved when a central database is collecting performance metrics etc. The application has a configuration file that connects to a control database, and the control database tells it the connection string to use to do its real customer work.
Aaron Bertrand
My point about the dynamic SQL is that this isn't dynamic SQL coming from an application or based on user input, it is called from maintenance jobs within SQL Server and is much more controlled than the more typical type of dynamic SQL that most people are scare-mongered into fearing.
Aaron Bertrand
There is no dynamic sql involved at all. The databases are almost completely independent of one another in terms of relationships and thus there are no cross-database joins. As I mentioned before, switching is done purely by changing the connection string (I.e. changing the initial catalog)
Brian
I agree the consolidating the SQL in a central database is probably safer than generating SQL strings on the fly, but it always depends on how you're doing it. If the central database is compromised, the malicious user can still do whatever they want with your dynamic SQL process, though they'd first have to break into one database. My point is not that dynamic SQL is always bad, but that there's generally a better way to accomplish the same thing, and SQL Server will handle a large database the same way it handles a set of smaller ones - in some cases, even more efficiently than the small set
rwmnau