tags:

views:

270

answers:

5

New to database and so no to get upset with simple questions. As far as my googled and gathered knowledge normalization reduces redundancy of data and increase the performance. But really, I didn't understand what exact reason for dividing the master table into other small tables, applying relationship among them, retrieving the data using all possible unions,subqueries,joins etc., Why can't we have all the data in a single table and retrieve them as on required. I'm little confused.

+1  A: 

We use normalization to reduce the chances of anomalies that may arise as a result of data insertion, deletion, updation. Normalization doesnt necessarily increase performance.

There is much material on internet so i wont repeat the stuff here again. But you can have a look at Normalization rules Anomalies (others aswell)

Umair Ahmed
+2  A: 
  • First normal form (1NF) - every non-key column in a table depends on the key.
  • Second normal form (2NF) - 1NF and every non-key column in a table depends on the entire key.
  • Third normal form (3NF) - 2NF and every non-key column in a table depends on nothing but the key.

This leads to the saying that 3NF depends on the key, the whole key and nothing but the key, so help me Codd (and if you don't know who Codd [or Date] is, you should research them, they're the fathers of relational databases).

Normalization is a valuable (read necessary) tool in ensuring we don't have redundant data (which becomes a real problem if the two redundant areas get out of sync). It doesn't generally increase performance. In fact, although all database should start in 3NF, it's sometimes acceptable to drop to 2NF for performance gains, provided you're aware of the potential problems.

paxdiablo
+4  A: 

The main reason is to eliminate repetition of data, so for example if you had a user with multiple addresses and you stored this information in a single table the user information would be duplicated along with each address entry. Normalisation would seperate the addresses into their own table and then link the two using keys. This way you wouldn't need to duplicate the user data, and your db structure becomes a little cleaner.

Full normalisation will generally not improve performance, in fact it can often make it worse but it will keep your data duplicate free. In fact in some special cases I've denormalised some specific data in order to get a performance increase.

Steve Temple
so if there is no duplication you should not normalize?
01
If you just had a list of users with only ever one address then there would be no need to break the data into multiple tables. I would still advise using a unique key for that data.
Steve Temple
+2  A: 

Normalization comes from the mathematical concept of being "normal." Another word would be "perpendicular." Imagine a regular two-axis coordinate system. Moving up just changes the y coordinate, moving to the side just changes the x coordinate. So every movement can be broken down into a sideways and an up-down movement. These two are independent of each other.

Normalization in database essentially means the same thing: If you change a piece of data, this is supposed to change just one single piece of information in a database. Imagine a database of E-Mails: If you store the ID and the name of the recipient in the Mails table, but the Users table also associates the name to the ID, that means if you change a user name, you don't only have to change it in the users table, but also in every single message that this user is involved with. So, the axis "message" and the axis "user" are not "perpendicular" or "normal."

If on the other hand, the Mails table only has the user ID, any change to the user name will automatically apply to all the messages, because on retrieval of a message, all user information is gathered from the Users table (by means of a join).

balpha
A: 

As well as all the above, it just makes a certain sense. Say you have a user and you want to record what kind of car they have.

Put that all in one table and then you're fine, until someone owns two cars... You're then going to need two rows for that person, and a way of making sure that you can link those two rows together...

And then what if you also want to record how many dogs they have? Same table with lots of confusing dups? Another table with your own custom logic to manage unique users?

Normalization keeps you away from a lot of these problems...

Paddy