views:

75

answers:

3

Can someone please clarify data normalization? What are the different levels? When should I "de-normalize"? Can I over normalize? I have a table with millions of records, and I believe I over-normalized it, but I'm not sure.

+3  A: 

If you have million columns you probably under-normalized it.
What normalizing means is that

every non-key attribute "must provide a fact about the key, the whole key, and nothing but the key."

If you have a column that depends on anything but the key, you should normalize your table.
see here.

Added to reply to comment:
If you have ProductID | ProductType | ProductTypeID, where ProdcutTypeID depends only on ProductType, you should make a new table for that:
ProductID | ProductTypeID and on the other table: ProductTypeID | ProductTypeName .
So to answer your question, pertaining to Product isn't accurate enough, in my example at the first case, I was pertaining to the Product as well. All columns should pertain only to ProductID (you may say you only describe product, but not describing anything else, even if it's related to product - that's accurate).
Number of rows, generally speaking isn't relevent.

Oren A
So, if I have a key called 'ProductID', then the columns within that table should only pertain to a Product? Is that what 3 level Normalization is? I have over a million records, not columns, btw
Moderator71
Okay, so what do you call it when I need to create new tables because the number of rows are growing rapidly in a table (i.e. Log table)? I need to create a new table for logging Login entries, Exception entries, etc. FYI, the Log table is very generic containing LogID, LogReason, LoggedBy, LoggedOn, etc.
Moderator71
That's a different story. I suggest you ask a new question with all relevant details about your log, and what exactly you're trying to achieve.
Oren A
Okay Will do, Thanks
Moderator71
Here's my new question, if you can help me: http://stackoverflow.com/questions/3874603/log-table-with-millions-of-rows-now-what
Moderator71
Got your answers there. Have nothing to do with normalizing. gl.
Oren A
+4  A: 

Normalization is about reducing data duplication in a relational database. The most popular level is third normal form (it's the one described by "the key, the whole key, and nothing but the key"), but there are a lot of different levels, see the Wikipedia entry for a list of the main ones. (In practice people seem to think they're doing well to achieve third normal form.) Denormalizing means accepting more data duplication, typically in exchange for better performance.

Nathan Hughes
+1 for covering denormalization.
MaxVT
+1, good concise answer giving the purpose for normalization, not just the definition, i.e., why do we do it? I too like the denormalization mention.
DCookie
A: 

As others said Database normalization is about reduction of data duplication and more generic data models (that can easily answer to queries unexpected at design time). Normalisation of a database is allow a formal enough process. When you are experimented you mostly follow data analysis methods and get a normalized database at the end.

Normalizing database is usually a good idea, but there is a catch. In many case it involve creation of new tables and JOIN relationships between tables. JOIN is known to have a (very) high performance cost at runtime, henceforth for big volumes of data you may want to denormalize.

Another cost may also be the need to write more complex requests to access to the data you need and that can be a problem for SQL beginners. The best idea is probably to stick with normalization anyway (Third Normal Form is usually enough, as there is several levels of normalization as others said) and to become more skilled with SQL.

kriss