In terms of databases, is "Normalize for correctness, denormalize for performance" a right mantra?

views:

1340

answers:

+8 Q:

In terms of databases, is "Normalize for correctness, denormalize for performance" a right mantra?

Normalization leads to many essential and desirable characteristics, including aesthetic pleasure. Besides it is also theoretically "correct". In this context, denormalization is applied as a compromise, a correction to achieve performance. Is there any reason other than performance that a database could be denormalized?

+24 A:

The two most common reasons to denormalize are:

Performance
Ignorance

The former should be verified with profiling, while the latter should be corrected with a rolled-up newspaper ;-)

I would say a better mantra would be "normalize for correctness, denormalize for speed - and only when necessary"

Steven A. Lowe 2008-11-16 03:28:17

+1: made me laugh out loud!

Mitch Wheat 2008-11-16 03:29:06

I am so stealing that secondary correction technology. Genius.

x0n 2008-11-16 04:53:24

No excuse for justifying denormalisation: -1

Philippe Grondier 2008-11-16 21:41:59

@[Philippe Grondier]: "in programming, there are no absolutes, not even this one" --anonymous

Steven A. Lowe 2008-11-17 02:05:57

+3 A:

Data warehouses in a dimensional model are often modelled in a (denormalized) star schema. These kinds of schemas are not (normally) used for online production or transactional systems.

The underlying reason is performance, but the fact/dimensional model also allows for a number of temporal features like slowly changing dimensions which are doable in traditional ER-style models, but can be incredibly complex and slow (effective dates, archive tables, active records, etc).

Cade Roux 2008-11-16 03:31:52

Simplicity? Not sure if Steven is gonna swat me with his newspaper, but where I hang, sometimes the denormalized tables help the reporting/readonly guys get their jobs done without bugging the database/developers all the time...

aSkywalker 2008-11-16 03:36:47

One of the most reoccurring problems I have with normalized data for reports is the request to provide flat file exports where the normalized rows are all in one column. It never ends up being simple to do.

Chris 2008-11-16 03:38:57

Steven reads a e-newspaper, so should not matter. Aren't there tools that could help in reporting without influencing design?

Aydya 2008-11-16 03:41:00

These problems can also be solved by creating read-only views for reporting.

Turnkey 2008-11-16 03:53:10

I often do both. Have a normalized database for the application and a denormalized report database.

Booji Boy 2008-11-16 04:12:18

It isn't clear whether you need swatting because you cave in to the requests of the read-only guys, or whether the read-only guys need it because they haven't learned, or both. Probably both. Views can help the read-only guys - that is a good use for them, in fact.

Jonathan Leffler 2008-11-16 04:41:13

i would swat you both with the Sunday Times-Free-Press, including the advertisements: the report guys for not knowing how to do a join and make a view, and you for risking data corruption and incurring possible update overhead instead of providing reporting views/functions in the first place ;-)

Steven A. Lowe 2008-11-16 04:51:28

There's nothing wrong with creating a denormalized copy of a database, but it's often better to use data warehousing techniques for this, rather than baking it into the schema. SSIS and equivalent Oracle tools can do this kind of thing.

JasonTrue 2008-11-16 05:20:33

I am gonna look a little thick, and probably a little stubborn too, but if the primary reason is performance (that is ok right), can't a secondary benefit still be that it is a littler simpler? When those joins and views are taking up too many cycles?

aSkywalker 2008-11-16 05:34:47

performance is an acceptable reason to denormalize, but ya gotta prove it! modern db engines employ caching strategies that often make normalized databases more efficient than their denormalized equivalents

Steven A. Lowe 2008-11-16 05:52:29

@aSkywalker: the trouble is that denormalization makes select operations simpler, but it vastly complicates modify operations. In some cases - data warehousing, perhaps - that trade-off can be acceptable.

Jonathan Leffler 2008-11-17 05:39:35

@JasonTrue: In the SQL Server world, SSRS actually provides a way to create the denormalized view of the database through it's 'Reporting Model', saving you the trouble of using SSIS to transport the data.

Harper Shelby 2008-11-17 15:03:33

thanks everyone for the great comments. I've learned a lot!

aSkywalker 2008-11-18 15:02:07

+2 A:

Database normalization isn't just for theoretical correctness, it can help to prevent data corruption. I certainly would NOT denormalize for "simplicity" as @aSkywalker suggests. Fixing and cleaning corrupted data is anything but simple.

Cybis 2008-11-16 03:46:48

From the original question:Normalization leads to many essential and desirable characteristics, including aesthetic pleasure.

Aydya 2008-11-16 03:53:51

I wouldn't call data consistency merely an "aesthetic pleasure".

Cybis 2008-11-16 04:03:26

+5 A:

Denormalization normally means some improvement in retrieval efficiency (otherwise, why do it at all), but at a huge cost in complexity of validating the data during modify (insert, update, sometimes even delete) operations. Most often, the extra complexity is ignored (because it is too damned hard to describe), leading to bogus data in the database, which is often not detected until later - such as when someone is trying to work out why the company went bankrupt and it turns out that the data was self-inconsistent because it was denormalized.

I think the mantra should go "normalize for correctness, denormalize only when senior management offers to give your job to someone else", at which point you should accept the opportunity to go to pastures new since the current job may not survive as long as you'd like.

Or "denormalize only when management sends you an email that exonerates you for the mess that will be created".

Of course, this assumes that you are confident of your abilities and value to the company.

Jonathan Leffler 2008-11-16 04:49:04

+1 for pointing out the political component; modern databases employ caching strategies that mean that normalized data often outperforms unnormalized data for most queries. Profile first, denormalize later ;-)

Steven A. Lowe 2008-11-16 05:11:37

@Steven: thanks.

Jonathan Leffler 2008-11-16 05:29:40

I will make this new mantra mine!

Philippe Grondier 2008-11-16 21:40:01

No way. Keep in mind that what you're supposed to be normalizing is your relations (logical level), not your tables (physical level).

David Mathers 2008-11-16 14:01:17

Denormalized data is much more often found at places where not enough normalization was done.

My mantra is 'normalize for correctness, eliminate for performance'. RDBMs are very flexible tools, but optimized for the OLTP situation. Replacing the RDBMS by something simpler (e.g. objects in memory with a transaction log) can help a lot.

Stephan Eggermont 2008-11-16 14:23:04

+1 A:

You don't normalize for 'correctness' per se. Here is the thing:

Denormalized table has the benefit of increasing performance but requires redundancy and more developer brain power.

Normalized tables has the benefit of reducing redundancy and increasing ease of development but requires performance.

It's almost like a classic balanced equation. So depending on your needs (such as how many that are hammering your database server) you should stick with normalized tables unless it is really needed. It is however easier and less costly for development to go from normalized to denormalized than vice versa.

Spoike 2008-11-16 14:36:05

+1 for pointing out that the journey from normalized to denormalized is far easier than the converse

Steven A. Lowe 2009-01-19 03:25:48

+2 A:

Don't forget that each time you denormalize part of your database, your capacity to further adapt it decreases, as risks of bugs in code increases, making the whole system less and less sustainable.

Good luck!

Philippe Grondier 2008-11-16 21:37:52

Every time you denormalize, E. F. Codd and C. J. Date execute a CS intern! ;-)

Steven A. Lowe 2009-01-19 03:23:24

+2 A:

Mantras almost always oversimplify their subject matter. This is a case in point.

The advantages of normalizing are more that merely theoretic or aesthetic. For every departure from a normal form for 2NF and beyond, there is an update anomaly that occurs when you don't follow the normal form and that goes away when you do follow the normal form. Departure from 1NF is a whole different can of worms, and I'm not going to deal with it here.

These update anomalies generally fall into inserting new data, updating existing data, and deleting rows. You can generally work your way around these anomalies by clever, tricky programming. The question then is was the benefit of using clever, tricky programming worth the cost. Sometimes the cost is bugs. Sometimes the cost is loss of adaptability. Sometimes the cost is actually, believe it or not, bad performance.

If you learn the various normal forms, you should consider your learning incomplete until you understand the accompanying update anomaly.

The problem with "denormalize" as a guideline is that it doesn't tell you what to do. There are myriad ways to denormalize a database. Most of them are unfortunate, and that's putting it charitably. One of the dumbest ways is to simply denormalize one step at a time, every time you want to speed up some particular query. You end up with a crazy mish mosh that cannot be understood without knowing the history of the application.

A lot of denormalizing steps that "seemed like a good idea at the time" turn out later to be very bad moves.

Here's a better alternative, when you decide not to fully normalize: adopt some design discipline that yields certain benefits, even when that design discipline departs from full normalization. As an example, there is star schema design, widely used in data warehousing and data marts. This is a far more coherent and disciplined approach than merely denormalizing by whimsy. There are specific benefits you'll get out of a star schema design, and you can contrast them with the update anomalies you will suffer because star schema design contradicts normalized design.

In general, many people who design star schemas are building a secondary database, one that does not interact with the OLTP application programs. One of the hardest problems in keeping such a database current is the so called ETL (Extract, Transform, and Load) processing. The good news is that all this processing can be collected in a handful of programs, and the application programmers who deal with the normalized OLTP database don't have to learn this stuff. There are tools out there to help with ETL, and copying data from a normalized OLTP database to a star schema data mart or warehouse is a well understood case.

Once you have built a star schema, and if you have chosen your dimensions well, named your columns wisely, and especially chosen your granularity well, using this star schema with OLAP tools like Cognos or Business Objects turns out to be almost as easy as playing a video game. This permits your data analysts to focus on analysing the data instead of learning how the container of the data works.

There are other designs besides star schema that depart from normalization, but star schema is worth a special mention.

Walter Mitty 2008-11-17 14:57:28

+1 A:

Normalization has nothing to do with performance. I can't really put it better than Erwin Smout did in this thread: http://stackoverflow.com/questions/1379340/what-is-the-resource-impact-from-normalizing-a-database

Most SQL DBMSs have limited support for changing the physical representation of data without also compromising the logical model, so unfortunately that's one reason why you may find it necessary to demormalize. Another is that many DBMSs don't have good support for multi-table integrity constraints, so as a workaround to implement those constraints you may be forced to put extraneous attributes into some tables.

dportas 2009-10-17 14:43:35

ansaurus

tags:

views:

answers:

In terms of databases, is "Normalize for correctness, denormalize for performance" a right mantra?

related questions