views:

231

answers:

7

The current project I inherited mainly revolves around one unnormalized table. There are some attempts at normalization but the necessary constraints weren't put in place.

Example: In the Project table, there is a client name (among other values) and there is also a clients table which just contains client names [no keys anywhere]. The clients table is just used as a pool of values to offer the user when adding a new project. There isn't a primary key on the clients table or a foreign key.

"Design patterns" such as this is common through the current state of the database and in the applications that use it. The tools I have my disposal are SQL Server 2005, SQL Server Management Studio, and Visual Studio 2008. My initial approach has been to manually determine which information needs normalization and running Select INTO queries. Is there a better approach than a case by case or anyway this could be automated?

Edit: Also, I've discovered that a "work order number" isn't an IDENTITY (autonumber, unique) field and they are generated sequentially and are unique to each work order. There are also some gaps in the existing numbering but all are unique. Is the best approach for this writing a store procedure to generate dummy rows before migrating?

A: 

I can't think of a sensible way of automating this.... some human input is key in such refactorings, if you want the output to be useful.

Re work order number; assuming you want this to continue being an IDENTITY column; can you perhaps fill the data, find the largest, then use ALTER TABLE to make it IDENTITY? I don't have any TSQL tools to hand, so I can't test, unfortunately. Alternatively, just consider it a natural key.

Marc Gravell
A: 
  1. Create the new database the way you think it should be structured.
  2. Create an importError table in the new database with columns like "oldId" and "errorDesc"
  3. Write a straightforward, procedural, legible script that attempts to select a row from the old structure and insert it into the new structure. If an insert fails, log as specific an error as possible to the importError table (specifically, why the insert failed).
  4. Run the script.
  5. Validate the new data. Check whether there are errors logged to the importError table. If the data is invalid or there are errors, refactor your script and run it again, possibly modifying your new database structure where necessary.
  6. Repeat steps 1-5 until you have a solid conversion script.

The result of this process will be that you have: a) a new db structure that is validated against the old structure and tested against "pragmatism"; b) a log of potential issues you may need to code against (such as errors that you can't fix through your conversion because they require a concession in your schema that you don't want)

(I might note that it's helpful to write the script in your scripting/programming language of choice, rather than in, say, SQL.)

Rahul
+7  A: 

The best approach to migrating to a usable design? CAREFULLY

Unless you're willing to break (and fix) every application that currently uses the database, your options are limited, because you can't change the existing structure very much.

Before you begin, think carefully about your motivations - if you have an existing issue (a bug to fix, an enhancement to make) then go ahead slowly. However, it's rarely worthwhile to monkey around with a working production system just to achieve an improvement that nonone else will ever notice. Note that this can play into your favour - if there's an existing issue, you can point out to management that the most cost-effective way to fix things is to alter the database structure in this way. This means you have management support for the changes - and (hopefully) their backup if something turns pear shaped.

Some practical thoughts ...

Make one change at a time ... and only one change. Make sure each change is correct before you move on. The old proverb of "measure twice, cut once" is relevant.

Automate Automate Automate ... Never ever make the changes to the production system "live" using SQL Server Management Studio. Write SQL scripts that perform the entire change in one go; develop and test these against a copy of the database to make sure you get them right. Don't use production as your test server - you might accidentally run the script against production; use a dedicated test server (if the database size is under 4G, use SQL Server Express running on your own box).

Backups ... the first step in any script should be to backup the database, so that you've got a way back if something does go wrong.

Documentation ... if someone comes to you in twelve months, asking why feature X of their application is broken, you'll need a history of the exact changes made to the database to help diagnosis and repair. First good step is to keep all your change scripts.

Keys ... it's usually a good idea to keep the primary and foreign keys abstract, within the database and not revealed through the application. Things that look like keys at a business level (like your work order number) have a disturbing habit of having exceptions. Introduce your keys as additional columns with appropriate constraints, but don't change the definitions of existing ones.

Good luck!

Bevan
Shucks! I was going to offer about half of your advice. I didn't even think of the other half!
Pulsehead
+1: Excellent answer. Good level of detail and clear prose.
John Sansom
A: 

I recommend using stored procedures to aid the translation process.

Specifically:

  1. One by one, replace queries used in the code with stored procedures. As part of the replacement, write unit (or integration) tests against the stored procedures directly. Consider a code-level StoredProcs helper class to consolidate database access there.
  2. After all queries are sprocs, you can refactor the database, using those unit tests to make sure you're not changing expected behavior.
  3. Added advantage: You'll have those unit tests to guard against future breakages.
Jason Cohen
A: 

You did not say whether you need to keep the current application interface, or whether you are planning to rewrite any queries in the application.

Either way, I would

  • design the new schema
  • write T-SQL batches, using cursors where necessary, to migrate the data

Cursors, while not a first choice in operational queries, are great for this type of application, because you can go about the task in a very structured way. These scripts tend to be very readable, which is important when it does not work right away and you have go through a few iterations.

cdonner
A: 

You can use SQL Server Integration Services (SSIS) which is part of the SQL Server 2005 to help you with migration. It is used to transfer data from one form to the other:

http://en.wikipedia.org/wiki/SQL_Server_Integration_Services http://www.microsoft.com/sqlserver/2005/en/us/integration-services.aspx

sabbour
A: 

Just to add a simple hint. When you have your Entity Relationship diagram on one A4 or A3 in front of you proper normalization will mean no many to many relationships. Check this book or at least the site also.

YordanGeorgiev