views:

329

answers:

9

When taking a database from a relatively un-normalized form and normalizing it, what, if any, changes in resource utilization might one expect?

For example, normalization often means more tables get created from fewer which means the database now has a higher number of tables, but many of them are quite small, allowing the often used ones to fit into memory better.

The higher number of tables also means that more joins are needed (potentially) to get at the data that was abstracted out, so one would expect some sort of impact from the higher number of joins the system needs to do.

So, what impact on resource usage (ie. what will change) does normalizing an un-normalized database have?


Edit: To add a bit of context, I have an existing (ie. legacy) database with over 300 horrible tables. About 1/2 of the data is TEXT and the other half is either char fields or integers. There are no constraints of any kind. The reason I ask is primarily to get more information for convincing others that things need to change and that there won't be a decrease in performance or maintainability. Unfortunately, those I have to convince know just enough about the performance benefits of a de-normalized database to want to avoid normalization as much as possible.

+1  A: 

For one thing, you'll end up having to do resultset calculations. For example, if you have a Blog, with a number of Posts, you could either do:

select count(*) from Post where BlogID = @BlogID

which is more expensive than

select PostCount from Blog where ID = @BlogID

and can lead to the SELECT N+1 problem, if you're not careful.

Of course with the second option you have to deal with keeping the data integrity, but if the first option is painful enough, then you make it work.

Be careful you don't fall foul of premature optimisation. Do it in the normalised fashion, then measure performance against requirements, and only if it falls short should you look to denormalise.

Neil Barnwell
+3  A: 

There's a very simple answer to your question: it depends.

Firstly, I'd re-phrase your question as 'what is the benefit of denormalization', because normalization is the something that should be done as a default (as the result of a pure logical model) and then denormalization can be applied for very specific tables where performance is critical. The main problem of denormalization is that it can complicate data integrity management, but the benefits in some cases outweigh the risks.

My advice for denormalization: do it only when it really hurts and make sure you got all scenarios covered when it comes to maintaining data integrity after any inserts, updates or deleted.

Andrew from NZSG
This is similar to advice I've heard and tend to agree with, now that I have some experience under my belt - "Normalize until it hurts performance, and no more."
David Stratton
+11  A: 

This can not really be answered in a general manner, as the impact will vary heavily depending on the specifics of the database in question and the apps using it.

So you basically stated the general expectations concerning the impact:

  1. Overall memory demands for storage should go down, as redundant data gets removed
  2. CPU needs might go up, as queries might get more expensive (Note that in many cases queries on a normalized database will actually be faster, even if they are more complex, as there are more optimization options for the query engine)
  3. Development resource needs might go up, as developers might need to construct more elaborate queries (But on the other hand, you need less development effort to maintain data integrity)

So the only real answer is the usual: it depends ;)

Note: This assumes that we are talking about cautious and intentional denormalization. If you are referring to the 'just throw some tables together as data comes along' approach way to common with inexperienced developers, I'd risk the statement that normalization will reduce resource needs on all levels ;)


Edit: Concerning the specific context added by cdeszaq, I'd say 'Good luck getting your point through' ;)

Oviously, with over 300 Tables and no constraints (!), the answer to your question is definitely 'normalizing will reduce resource needs on all levels' (and probably very substantially), but:

Refactoring such a mess will be a major undertaking. If there is only one app using this database, it is already dreadful - if there are many, it might become a nightmare!

So even if normalizing would substantially reduce resource needs in the long run, it might not be worth the trouble, depending on circumstances. The main questions here are about long term scope - how important is this database, how long will it be used, will there be more apps using it in the future, is the current maintenance effort constant or increasing, etc. ...

Don't ignore that it is a running system - even if it's ugly and horrible, according to your description it is not (yet) broken ;-)

Henrik Opel
+2  A: 

I've found that normalization, in some cases, will improve performance.

Small tables read more quickly. A badly denormalized database will often have (a) longer rows and (b) more rows than a normalized design.

Reading fewer shorter rows means less physical I/O.

S.Lott
+2  A: 

To underscore some points made by prior posters: Is you current schema really denormalized? The proper way (imho) to design a database is to:

  • Understand as best you can the system/information to be modeled
  • Build a fully normalized model
  • Then, if and as you find it necessary, denormalize in a controlled fashion to enhance performance

(There may be other reasons to denormalize, but the only ones I can think of off-hand are political ones--have to match the existing code, the developers/managers don't like it, etc.)

My point is, if you never fully normalized, you don't have a denormalized database, you've got an unnormalized one. And I think you can think of more descriptive if less polite terms for those databases.

Philip Kelley
I can indeed think of other names for this database, and yes, it is an *unnormalized* database, as you say. Thank you for the clarification.
cdeszaq
+1  A: 

Normalized schemas tend to perform better for INSERT/UPDATE/DELETE because there are no "update anomalies" and the actual changes that need to be made are more localized.

SELECTs are mixed. Denormalization is esentially materializing a join. There's no doubt that materializing a join sometimes helps, however, materialization is often very pessimistic (probably more often than not), so don't assume that denormalization will help you. Also, normalized schemas are generally smaller and therefore might require less I/O. A join is not necessarily expensive, so don't automatically assume that it will be.

Jeff Davis
A: 

"Normalization" applies only and exclusively to the logical design of a database.

The logical design of a database and the physical design of a database are two completely distinct things. Database theory has always intended for things to be this way. The fact that the developers who overlook/disregard this distinction (out of ignorance or out of carelessness or out of laziness or out of whatever other so-called-but-invalid "reason") are the vast majority, does not make them right.

A logical design can be said to be normalized or not, but a logical design does not inherently carry any "performance characteristic" whatsoever. Just like 'c:=c+1;' does not inherently carry any performance characteristic.

A physical design does determine "performance characteristics", but then again a physical design simply does not have the quality of being "normalized or not".

This flawed perception of "normalization hurting performance" is really nothing else than concrete proof that all the DBMS engines that exist today are just seriously lacking in physical design options.

+1  A: 

I wanted to elaborate on Henrik Opel's #3 bullet point. Development costs might go up, but they don't have to. In fact, normalization of a database should simplify or enable the use of tools like ORMs, Code Generators, Report Writers, etc. These tools can significantly reduce the time spent on the data access layer of your applications and move development on through to adding business value.

You can find a good StackOverflow discussion here about the development aspect of normalized databases. There were many good answers, comments and things to think about.

Travis Heseman
A: 

erm... a point to the last comment (by Erwin): yes this is old, but still shows up in searches. What you describe , in fact does hold algorithmic computational cost, so to claim that logical design ideas can not assess physical performance characteristics to me sounds entirely incorrect.

To say, in physical design, many parameters can change the actual perfomance circumstances (as people mention with regards to I/O etc.).. sure and of course. But I think that last comment was just entirely erratic.

lakmilis