tags:

views:

65

answers:

2

So I'm currently working on rebuilding an existing website that is used internally at my company for project management, at heart it is a bug tracking utility that has some customer support and accounting operations linked into it.

Currently the database model is very repetitive, a good example of this is, currently a UserId is linked into a record (FK relationship into a user table that contains all the information about the user) and then all the information about the user also exists in the table.

I've been tasked with improving the website and the functionality of the model; however, I want to reduce the repetition of data in the website (is this normalization or is that the breaking apart of unlinked items into separate tables?). I'm not sure what the best method of doing this would be. I'm thinking of generating the creation scripts for the database and creating a new database project in VS to then modify the database, then generating some scripts to populate the new database model from the old database.

I plan on using the Entity Framework and ASP. NET MVC 2 to build the website as I think it provides the most flexible model moving forward for the modification and maintenance of the website.

The reason I ask all of this is because I'm very familiar with using databases and modifying existing ones to be used in applications and websites but I'm trying to discover the best way to build one.

I'm curious if there is any material on the best way to do this or if I should be using a different tool to do this with?

Edit: Providing more information on the model

There are 4 major areas that we have that are used:

  1. Cases (Bugs, Features, Working Tasks, Etc) 2 .Tickets (Tech Support Events)
  2. Errors (Errors Generated from our logging Library, Basically a stack trace with customer information)
  3. License (Keeps track of each customers License allows modification to those licenses)

These are the Objects that are intermixed and used throughout the above 4 major areas.

  1. Users (People who use the system)
  2. Customers (People who use our software)
  3. Stores (Places where our customers use our software)
  4. Products (Our Software)

Relationships

Cases: A Cases has to have a User, can have a Customer, Store, Error, Ticket and/or Product

Tickets A Ticket has to have a User and a Customer, can have a Store, Error and/or Product

Errors: A Error has to have a Product, Can Have a Case, Ticket, Store, and/or Product

Licenses: A Licenses has to have a Product and Customer, can have a Store

Like I said very basic website, with a not super complex database, if done correctly.

Currently the database has no FK constraints, replication of lots of information across each table and lots of extra tables that are duplicates with different names.

E.g.

Each Case type has a separate table so there is a FeatureRequest, Bug, Tasks, Completed, etc table that all contain the same information.

A: 

How long do you have, and how big is the database?

It's very difficult to answer this question black and white without being immersed in your environment and business case. It really doesn't seem like your limitation is technology wise, just to choose between solutions.

Re-creating is what programmers instinctively go for. However, in the "real world", sometimes we spend a lot of effort into something that isn't that used or wont last that long.

So food for thought. How long will it take you to re-do the database, how much will it cost. Will working with what's existent be sufficient for the functionality asked?

Lily
@Lily this is why I asked this question because I know that we all have the "I can do this better from scratch mode" and I want to make sure that it will actually be better, than just being different.
msarchet
+1  A: 

Normalization is about storing data without redundancy or anomalies.

One example of an anomaly could be when attributes about a user in your main table are not in sync with the users table. Someone changes information about that user in one table without reflecting the changes in the redundant copy. The problem is that it's hard to know which change is the correct one.

Some people think that normalization is just about breaking apart tables into littler tables, because that's what they see as the most common type of change. But that's not the goal of normalization. It's just by coincidence that most mistakes of non-normalization involve stuffing too much data into one table where multiple tables would be correct.

It's hard to answer your question about whether to modify your database in-place or whether to create a whole new database and migrate to it.

What I would do in your case is to design a properly normalized database, and then examine the differences between that and your existing database. Imagine what you would have to do for each difference, to change your old database to the new one, versus a data migration. It could be that only a few changes are needed, only dropping the redundant columns. Or it could be that some major rework is needed. It's impossible to tell until you do the work of creating a normalized data model so you can compare.

The bigger task might be to adapt your application code that uses the database. One way to ease this transition is to create database views on top of the normalized database, which mimic your old non-normalized database. That way hopefully you don't have to rewrite every bit of code in your app all at once, you can keep some of it the same at least until you can refactor the code.

Also having a good set of regression tests in place is ideal, so you can be sure your app still does all the tasks it is supposed to do, as you refactor the database and the code that uses the database.


Re your comment: You mention that you're adding new functionality to the user model at the same time. I would find it too confusing to try to do this simultaneously with refactoring. Refactoring typically does not change functionality, it only changes implementation. But refactoring adds value because it makes the code easier to maintain or debug, improves efficiency, or prepares you to make future functionality changes more easily.

I would recommend that you bit the bullet and add your new user model features to the old non-normalized database. It's good to get the benefit of new features in the short term, and also you need to develop those features first to understand them well enough to account for them in your big refactoring project.


Here are some suggestions for resources to help you truly understand what normalization means:

Here are a couple of resources for managing changes to a database:

Bill Karwin
@Bill Karwin this is super useful I'd been kind of debating doing the layout and seeing what I would have to do. When you say regression tests you mean basically we did a, b and got c on the old system, does the new system do this? Part of the problem with that is I'm reworking the general user model with this, it's not just oo let's make a newer tech version of this site.
msarchet
Regression tests allow you to verify that at least the previous functionality still works.
Bill Karwin
Thanks for the idea of 'making' the database the way I would like it to look then looking at what we have. I was spending so long looking at what I had that I wasn't able to determine if it was useful or not. This is giving me a much clearer picture
msarchet
I wish I could upvote your answer twice, because it improved even more with your links. Thanks for the help, this isn't a critical development stream so I have time to think about it. Part of the current problem is a majority of the current system isn't being used, since it doesn't work correctly and the site is broken in any browser that isn't IE 6 (shudder). This is why I'm doing this.
msarchet
Yikes. Best of luck with the database refactoring, but it sounds like you have your work cut out for you on the front-end too.
Bill Karwin
@Bill Karwin, that's actually the simple part once I get the database working correctly.
msarchet