views:

320

answers:

13

Is it when you're trying to get data and there is no apparent easy way of doing it?

When you find something should be a table on it's own?

What are the laws?

+2  A: 

When you're starting to question whether an SQL database needs more normalization.

Developer Art
+5  A: 

When you notice you have to repeat the same data, or when you start using single fields as arrays.

Ikke
+7  A: 

Check out Wikipedia. The article talks about database normalization and the different forms (first, second, third, etc.). Most times you should be aiming for at least third normal form. There are times when you want to relax the rules a bit (it may be too expensive to join multiple tables together so might want to de-normalize a bit) but for the most part third normal form is good.

TLiebe
Really. three normal forms can be defined in formal language. And you do not need any actions except take rules and apply them.
St.Shadow
+3  A: 

While this is a somewhat snarky answer, when you discover that the data isn't sufficiently normalized. There are many resources on the web about the levels (or, more properly, "forms") of normalization, and they more completely describe the forms than I could here. First and second normal forms should be pretty much required. If you aren't at third (or, really, fourth) normal form, you need to have a strong justification as to why.

Check out the Wikipedia article on database normalization.

Adam Robinson
A: 

This is a pretty good article. Getting normal is a science, not an art. Now knowing when to DEnormalize... that's an art.

http://www.alvechurchdata.co.uk/hints-and-tips/softnorm.html

jtb
+2  A: 

Whenever you have a relational database.... <grin/>

No, actually there are laws, check out this Wikipedia link.

they are called the five normal forms or something like that. Originally from the guy who invented relational databases in the 50s/60s, E. F. Codd.

"The key the whole key and nothing but the Key, so help me Codd"

This is a synopsis:

  1. First normal form (1NF) Table faithfully represents a relation and has no repeating groups
  2. Second normal form (2NF) No non-prime attribute in the table is functionally dependent on a part (proper subset) of a candidate key
  3. Third normal form (3NF) Every non-prime attribute is non-transitively dependent on every key of the table Every non-trivial functional dependency in the table is a dependency on a superkey
  4. Fourth normal form (4NF) Every non-trivial multivalued dependency in the table is a dependency on a superkey
  5. Fifth normal form (5NF) Every non-trivial join dependency in the table is implied by the superkeys of the table. Domain/key normal form (DKNF) Ronald Fagin (1981)[19] Every constraint on the table is a logical consequence of the table's domain constraints and key constraints
  6. Sixth normal form (6NF) Table features no non-trivial join dependencies at all (with reference to generalized join operator)
Charles Bretana
Hah! Where'd that come from ?? Thz!
Charles Bretana
A: 

What level of normalization are you currently at? If you can't answer that I assume your database is a nasty mess. I always hit 3rd normal on initial design and de-normalize or normalize further if and when needed.

StarShip3000
A: 

When you have to search trough huge amounts of data just to extract some basic info - i.e. what kind of Product categories are there or something like that.

Mr. Brownstone
+1  A: 

3NF is generally all you need and it follows three rules:

Every column in the table should be dependent on:

  • the key (1NF),
  • the whole key (2NF),
  • and nothing but the key (3NF) (so help me Codd is the way that quote usually ends).

You can often "downgrade" to 2NF for performance reasons, provided you understand the implications and only when you strike problems, but 3NF should be the initial goal for all your designs..

paxdiablo
+1  A: 

As everyone else has said, you know when you start having (too many) duplicate columns in multiple tables.

That being said, it is sometimes useful to have redundant columns across multiple tables. This can reduce the number of JOINs you have to do in complicated queries. Just be careful to keep all the tables in sync, or you're just asking for trouble.

Loadmaster
Yes, denormalizing to 2NF is perfectly acceptable to gain performance. Triggers are a godsend to ensure these redundant columns are synchronized.
paxdiablo
A: 

I assume you're talking about a transactional database supporting an interactive application, but for what it's worth...

OLAP databases used exclusively for reporting and only updated by ETL processes may benefit from a less normalized structure. In these applications you accept the cost of redundant data storage and duplication for the performance benefit of fewer joins and the increased ease of use for (sometimes less technical) data analysts and business analysts.

Transactional databases should always be normalized to the extent practical (at least 3NF) and then selectively denormalized only as needed. And the need to denormalize should ideally be based on actual performance testing results.

John M Gant
A: 

Other people have pointed you to the formal rules for normalization. Here are some informal guidelines I use:

  1. If you have columns in a table the names of which differ only by a number (eg Phone1 and PHone2).

  2. If you have any columns in a table that should be filled in only when another column in the table is filled in.

  3. If updating a "fact" in the database (such as a street address) requires more than one UPDATE.

  4. If the same question could ever get two different answers depending on which table you get your information from.

  5. If the answer to any non-trivial question can be gotten from the database without JOINing at least two tables.

  6. If you have any quantity-based restrictions in the database other than "only 1 of something is allowed" (that is, "only one address is allowed" is okay, but "only two addresses are allowed" indicates a normalization problem).

Larry Lustig