views:

275

answers:

4

Please forgive my long question. I have an idea for a design that I could use some comments on. Is it a good idea to do this? And what are the pit falls I should be aware of? Are there other similar implementations that are better?

My situation is as follows:
I am working on a rewrite of a windows forms application that connects to a SQL 2008 (earlier it was SQL 2005) server. The application is an "expert-system" for an engineering company where we store structured data about constructions. We have control of all installations of the client software, we have no external customers or users, they are all internal to the company, and they are all be trusted not to do anything malicious to the software or database.

The current design doesn't have too many tables (about 10 - 20) but some of them have millions of records that belong to several hundred constructions. The systems performance has been ok so far, but it is starting to degrade as we are pushing the limits of the design.

As part of the rewrite I am considering splitting the database into one master database and several "child" databases where each describes one construction. Each child database should be of identical design. This should eliminate the performance problems we are seeing today since the data stored in each database would be less than one percent of the total data amount.

My concern is that instead of maintaining one database we will now get hundreds of databases that must be kept up to date. The system is constantly evolving as the companys requirements change (you know how it is), and while we try to look forward to reduce the number of changes the changes will come. So we will need a system where we keep track of all database changes done to the system so they can be applied to the child databases. Updating the client application won't be a problem, we have good control of that aspect.

I am thinking of a change tracing system where we store database scripts for all changes in a table in the master database. We can then give each change a version number and we can store a current version number in each child database. When the client program connects to a child database we can then check the version number of the database against the current version number of the master database and if there are patches with version numbers greater than the version number of the child database we run these and update the child database to the latest version.

As I see it this should work well. Any changes to the system will first be tested and validated before committed as a new version of the database. The change will then be applied to the database the first time a user opens it. I suppose we would open the database in exclusive mode while applying the changes, but as long as the changes aren't too frequent this should not be a problem.

So what do you think? Will this work? Have any of you done something similar? Should we scrap the solution and go for the monolithic system instead?

+1  A: 

I have a similar situation here, though I use MySQL. Every database has a versions table that contains the version (simply an integer) and a short comment of what has changed in this version. I use a script to update the databases. Every database change can be in one function or sometimes one change is made by multiple functions. Functions contain the version number in the function name. The script looks up the highest version number in a database and applies only the functions that have a higher version number in order.

This makes it easy to update databases (just add new change functions) and allows me to quickly upgrade a recovered database if necessary (just run the script again).

Even when testing the changes before this allows for defensive changes. If you make some heavy changes on a table and you want to play it safe:

def change103(...):
    "Create new table."
def change104(...):
    """Transfer data from old table to new table and make
       complicated changes in the process.
    """
def change105(...):
    "Drop old table"
def change106(...):
    "Rename new table to old table"

if in change104() is something going wrong (and throws an exception) you can simply delete the already converted data from the new table, fix your change function and run the script again.

But I don't think that changing a database dynamically when a client connects is a good idea. Sometimes changes can take some time. And the software that accesses a database should match the schema of the database. You have somehow to keep them in sync. Maybe you could distribute a new software version and then you want to upgrade the database when a client is actually starting to use this new software. But I haven't tried that.

unbeknown
+1  A: 

Have you considered partitioning your large tables by 'construction'? This could alleviate some of the growing pains by splitting the storage for the tables across files/physical devices without needing to change your application.

Adding spindles (more drives) and performing a few hours of DBA work can often be cheaper than modifying/adapting software.

Otherwise, I'd agree with @heikogerlach and these similar posts:

How do I version my ms sql database

Mechanisms for tracking DB schema changes

How do you manage databases in development, test and production?

Ken Gentle
We're still considering doing optimizations on the database instead of splitting it up, but there are other reasons I want to do it apart from the performance. But thanks for the links, especially the last was very useful!
Rune Grimstad
A: 

Better don't create additional databases. At first glance you may think that you'll get some performance gain, but actually you get support nightmare. Remember - what can break, does break sooner or later.

It is way simpler to perform and optimize queries in single database. It is much easier manage user permissions in single database. It is much easier to make consistent backups for single database.

Like KenG said, if you need break your large tables - consider partitioning them. And add some drives :)

But at first run SQL profiler on your database and optimize indexes and queries. Several million rows is usually not a big problem to handle (unless your customer needs live totaling over half of these, in which case no partitioning can help).

Arvo
+1  A: 

I know that this is a crazy answer but here it goes...

I currently have a similar scenario where I need to keep control of database versions in multiple locations for a system using MS SQL Server.

What I am doing now is using Ruby on Rails ActiveRecord Migrations to keep control of database versions. Yes I know that we are talking about Windows systems but this works fine for me. (By the way, my system is programmed in VB and .NET)

I have installed Rails on each server, when I need to update the database schema I copy the migration files to the server and run rake db:migrate which updates the database to the latest version or rollbacks it to a desired version.

As a side effect you will have a set of migration files for your database schema in an database independent language (in this case ruby) that you can apply to other database engines and that you can put under source control too.

I know that this is a strange solution in which a totally different technology is used but it does not hurt to learn new approaches. You can find additional information here.

I have become a better .Net programmer since I learned Ruby on Rails. I asked here before a question about this approach.

hectorsq