views:

1285

answers:

14

When creating a database structure, what are good guidelines to follow or good ways to determine how far a database should be normalized? Should you create an un-normalized database and split it apart as the project progresses? Should you create it fully normalized and combine tables as needed for performance?

+1  A: 

I believe starting with an un-normalized database and moving toward normalized as you progress is usually easiest to get started. To the question of how far to normalize, my philosophy is to normalize until is starts to hurt. That may sound a little flippant, but it generally is a good way to gauge how far to take it.

GrizzlyGuru
+3  A: 

Jeff has a pretty good overview of his philosophy on his blog: Maybe normalization isn't normal. The main thing is: don't overdo normalization. But I think an even bigger point to take away is that it probably doesn't matter too much. Unless you're running the next Google, you probably won't notice much of a difference until your application grows.

Jason Baker
+6  A: 

Database normizational I feel is an art form.

You don't want to over normalize your database because you will have too many tables and it will cause your queries of even simple objects take longer than they should.

A good rule of thumb I follow is to normalize the same information repeated over and over again.

For example if you are creating a contact management application it would make sense to have Address (Street, City, State, Zip, etc. . ) as its own table.

However if you have only 2 types of contacts, Business or personal, do you need a contact type table if you know you are only going to have 2? For me no.

I would start by first figuring out the datatypes you need. Use a modeling program to help like Visio. You don't want to start with a non-normalized database because you will eventually normalize. Start by putting objects in there logical groupings, as you see data repeated take that data into a new table. I would keep up with that process until you feel you have the database designed.

Let testing tell you if you need to combine tables. A well written query can cover any over normalization.

David Basarab
+13  A: 

You want to start designing a normalized database up to 3rd normal form. As you develop the business logic layer you may decide you have to denormalize a bit but never, never go below the 3rd form. Always, keep 1rd and 2nd form compliant. You want to denormalize for simplicity of code, not for performance. Use indexes and stored procedures for that :)

The reason not "normalize as you go" is that you would have to modify the code you already have written most every time you modify the database design.

There are a couple of good articles:

http://www.agiledata.org/essays/dataNormalization.html

http://codebetter.com/blogs/raymond.lewallen/archive/2006/01/04/136196.aspx

sergeb
I know your answer is 1+ year old, but the 2nd link is broken. (+1 still).
javamonkey79
I'm sorry the 2nd link seems to be "D-E-D dead"... I emailed Raymond, will let you know if I hear from him.
sergeb
+7  A: 

@GrizzlyGuru A wise man once told me "normalize till it hurts, denormalize till it works".

It hasn't failed me yet :)

I disagree about starting with it in un-normalized form however, in my experience its' been easier to adapt your application to deal with a less normalized database than a more-normalized one. It could also lead to situations where its' working "well enough" so you never get around to normalizing it (until its' too late!)

AlexCuse
A: 

Often if you normalize as far as your other software will let you, you'll be done.

For example, when using Object-Relational mapping technology, you'll have a rich set of semantics for various many-to-one and many-to-many relationships. Under the hood that'll provide join tables with effectively 2 primary keys. While relatively rare, true normalization often gives you relations with 3 or more primary keys. In cases like this, I prefer to stick with the O/R and roll my own code to avoid the various DB anomalies.

Purfideas
A: 

Having a normalized database will give you the most flexibility and the easiest maintenance. I always start with a normalized database and then un-normalize only when there is an real life problem that needs addressing.

I view this similarly to code performance i.e. write maintainable, flexible code and make compromises for performance when you know that there is a performance problem.

jase
A: 

I agree that it is typically better to start out with a normalized DB and then denormalize to solve very specific problems, but I'd probably start at Boyce-Codd Normal Form instead of 3rd Normal Form.

Hank Gay
+5  A: 

Be wary of people telling you to denormalize for performance reasons! This is only true if you have a very primitive database engine.

The point of normalization is (basically) to reduce redundancy. If your database are not normalized, the redundant pieces of data may get out of sync, leading to data corruption. A normalized database has a logical model where this kind of data corruption is not possible.

When you have denormalized the database, then you might want to create views which is basically stored joins. In many cases you want to work with views rather than the base tables. A view looks like a denormalized table, but the difference is that it is still not possible to introduce corruption, because the base tables are normalized.

Now there may be some performance issues with views because the engine have to join several tables, which is usually slower than reading from a single table. This is why some people denormalize. However, in modern database engines you can create a materialized view (also called indexed view), which basically means than the engine maintains a precalculated view. Querying is now as fast as querying a denormalized table, but you are still protected against corruption due to redunadancy. You can create several materialized views based on the same base tables.

JacquesB
A: 

Just try to use common sense.

Also some say - and I have to agree with them - that, if you're finding yourself joining 6 (the magic number) tables together in most of your queries - not including reporting related ones- , than you might consider denormalizing a bit.

snomag
A: 
+1  A: 

I agree that you should normalise as much as possible and only denormalise if absolutely necessary for performance. And with materialised views or caching schemes this is often not necessary.

The thing to bare in mind is that by normalising your model you are giving the database more information on how to constrain your data so that you can remove the risk of update anomalies that can occur in incompletely normalised models.

If you denormalise then you either need to live with the fact that you may get update anomolies or you need to implement the constraint validation yourself in your application code. This takes away a lot of the benefit of using a DBMS which lets you define these constraints declaratively.

So assuming the same quality of code, denormalising may not actually give you better performance.

Another thing to mention is that hardware is cheap these days so throwing extra processing power at the problem is often more cost effective than accepting the potential costs of cleaning up corrupted data.

Simon Collins
A: 

The original poster never described in what situation the database will be used. If it's going to be any type of data warehousing project where at some point you will need cubes (OLAP) processing data for some front-end, it would be wiser to start off with star schema (fact tables + dimension) rather than looking into normalization. The Kimball books will be of great help in this case.

Jedidja
A: 

Don't forget The mother of all database normalization debates on Coding Horror (summarized on the High Scalability blog).

Assaf Lavie