views:

15156

answers:

4

Hey everyone,

I am trying to really wrap my head around the database normalization concept. I understand that it is used to avoid duplicate/redundant data, and it involves creating separate tables for certain things, but I'm not quite sure I understand 1NF, 2NF and 3NF fully. I have read some of the questions on here regarding normalization in general, as well as the wiki article, but I'd really appreciate a bit more help.

Can someone please give an example of 1NF, 2NF and 3NF in plain english?

Thanks very much,

+10  A: 

I've never had a good memory for exact wording, but in my database class I think the professor always said something like:

The data depends on the key [1NF], the whole key [2NF] and nothing but the key [3NF].

Chris Shaffer
...so help me Codd. http://en.wikipedia.org/wiki/Ted_Codd
Smashery
+4  A: 

Here's a quick, admittedly butchered response, but in a sentence:

1NF : Your table is organized as an unordered set of data, and there are no repeating columns.

2NF: You don't repeat data in one column of your table because of another column.

3NF: Every column in your table relates only to your table's key -- you would't have a column in a table that describes another column in your table which isn't the key.

For more detail, see wikipedia...

Dave Markle
+27  A: 

1NF is the most basic of normal forms - each cell in a table must contain only one piece of information, and there can be no duplicate rows.

2NF and 3NF are all about being dependent on the primary key. Recall that a primary key can be made up of multiple columns. As Chris said in his response:

The data depends on the key [1NF], the whole key [2NF] and nothing but the key [3NF] (so help me Codd).

2NF

Say you have a table containing courses that are taken in a certain semester, and you have the following data:

|-----Primary Key----|               uh oh |
                                           V
CourseID|  Semester  |  #Places   |  Course Name |
-------------------------------------------------|
IT101   |   2009-1   |  100       | Programming  |
IT101   |   2009-2   |  100       | Programming  |
IT102   |   2009-1   |  200       | Databases    |
IT102   |   2010-1   |  150       | Databases    |
IT103   |   2009-2   |  120       | Web Design   |

This is not in 2NF, because the fourth column does not rely upon the entire key - but only a part of it. The course name is dependent on the Course's ID, but has nothing to do with which semester it's taken in. Thus, as you can see, we have duplicate information - several rows telling us that IT101 is programming, and IT102 is Databases. So we fix that by putting the course name into another table, where CourseID is the ENTIRE key.

Primary Key |

CourseID    |  Course Name |
---------------------------|
IT101       | Programming  |
IT102       | Databases    |
IT103       | Web Design   |

No redundancy!

3NF

Okay, so let's say we also add the name of the teacher of the course, and some details about them, into the RDBMS:

|-----Primary Key----|                           uh oh |
                                                       V
Course  |  Semester  |  #Places   |  TeacherID  | TeacherName  |
---------------------------------------------------------------|
IT101   |   2009-1   |  100       |  332        |  Mr Jones    |
IT101   |   2009-2   |  100       |  332        |  Mr Jones    |
IT102   |   2009-1   |  200       |  495        |  Mr Bentley  |
IT102   |   2010-1   |  150       |  332        |  Mr Jones    |
IT103   |   2009-2   |  120       |  242        |  Mrs Smith   |

Now it should be obvious that TeacherName is dependent on TeacherID - so this is not in 3NF. To fix this, we do much the same as we did in 2NF - take TeacherName out of this table, and put it in its own, which has TeacherID as the key.

 Primary Key |

 TeacherID   | TeacherName  |
 ---------------------------|
 332         |  Mr Jones    |
 495         |  Mr Bentley  |
 242         |  Mrs Smith   |

No redundancy!!

One important thing to remember is that if something is not in 1NF, it is not in 2NF or 3NF either. So each additional Normal Form requires everything that the lower ones had, plus some extra conditions, which must all be fulfilled.

Smashery
Thanks for this - right what I was looking for.In the 2NF example - how exactly is the course name dependent on the course's ID? How is the teachers name dependent on the ID?
barfoon
Just think about it in terms of the relationship between things. If I ask you "what is the name of the course with ID IT101?", you can give me an answer, right? Similarly, if I ask you "what teacher has the ID 332?" you can tell me what teacher that is. Thus, the course name is dependent on its ID.
Smashery
However, you can't go the other way - if I were to ask you "What is Mr Jones's ID?" you may not be able to give a unique answer, as there may be two Mr Joneses. So the ID is not dependent on the name - it's the name that is dependent on the ID.
Smashery
You can also think of it this way - look at the third table down (the first with TeacherName in it). What's to stop me having "Mr Jones" in the first row, but then putting "Mr Bloggs" in the second row? I _shouldn't_ be allowed to do that, because they've both got the ID of 332.
Smashery
But if you design your table so that it is not in 3NF, there's nothing to stop you from writing two different names for the one ID - and then your database is contradicting itself!
Smashery
The thing to understand for 2NF is that if you know a course's ID, you already know its name - you don't need its semester as well! That would be like saying "Okay, so your student number is 314156 - but I can't work out your name. Tell me your favourite colour" "Red" "Oh, you must be Johnny"
Smashery
A student number _uniquely_ identifies you - you don't need to know anything else. The row in the database won't say you're Johnny because your favourite colour is Red - it'll tell you you're Johnny because your student ID is 314156. So your name is dependent on the ID, and _only_ the ID.
Smashery
Solid answer - thank you.
barfoon
I see this "so help me Codd" in a lot of places, what does it mean?
instantsetsuna
@instantsetsuna - http://en.wikipedia.org/wiki/Ted_Codd
Smashery
@instantsetsuna - Full explanation: In some courts, a witness is asked if they will tell "The truth, the whole truth, and nothing but the truth, so help me God"; as God is considered to be the one with authority when it comes to knowing if you're telling the truth. In the case of databases, we can say "The data depends on the key, the whole key and nothing but the key, so help me Codd". Ted Codd is the one who came up with the idea of relational databases - things relying on keys, etc., so he would be the authority to go by in the case of relational databases.
Smashery
I understand now, Thanks for taking time to explain! :)
instantsetsuna