ansaurus

Question

Answer 1

+7 A:

Well, if I had to explain it to my wife it would have been something like that:

The main idea is to avoid duplication of large data.

Let's take a look at a list of people and the country they came from. Instead of holding the name of the country which can be as long as "Bosnia & Herzegovina" for every person, we simply hold a number that references a table of countries. So instead of holding 100 "Bosnia & Herzegovina"s, we hold 100 #45. Now in the future, as often happens with Balkan countries, they split to two countries: Bosnia and Herzegovina, I will have to change it only in one place. well, sort of.

Now, to explain 2NF, I would have changed the example, and let's assume that we hold the list of countries every person visited. Instead of holding a table like:

Person   CountryVisited   AnotherInformation   D.O.B.
Faruz    USA              Blah Blah            1/1/2000
Faruz    Canada           Blah Blah            1/1/2000

I would have created three tables, one table with the list of countries, one table with the list of persons and another table to connect them both. That gives me the most freedom I can get changing person's information or country information. This enables me to "remove duplicate rows" as normalization expects.

Faruz 2010-02-25 05:40:20

Your mentioning 1NF, 2NF, etc. reminded me too much of one database course I took in college... Lost a few marks on every test because skipping about 3 or 4 steps was easier than trying to remember them all... Bleh, glad I'm past that.

Slokun 2010-04-23 17:35:46

Answer 2

+3 A:

I would say that normalization is like keeping notes to do things efficiently, so to speak:

If you had a note that said you had to go shopping for ice cream without normalization, you would then have another note, saying you have to go shopping for ice cream, just one in each pocket.

Now, In real life, you would never do this, so why do it in a database?

For the designing and implementing part, thats when you can move back to "the lingo" and keep it away from layman terms, but I suppose you could simplify. You would say what you needed to at first, and then when normalization comes into it, you say you'll make sure of the following:

There must be no repeating groups of information within a table
No table should contain data that is not functionally dependent on that tables primary key
For 3NF I like Bill Kent's take on it: Every non-key attribute must provide a fact about the key, the whole key, and nothing but the key.

I think it may be more impressive if you speak of denormalization as well, and the fact that you cannot always have the best structure AND be in normal forms.

Kyle Rozendo 2010-02-25 05:53:19

Answer 3

+4 A:

This is what I ask interviewees, why don't we use a single table for an application instead of using a number of tables. The answer is ofcourse normalization. As already said, its to avoid redundancy and there by update anomalies.

shikhar 2010-02-25 05:57:15

+1 for "redundancy " and "anomalies"

gbn 2010-02-25 06:29:01

Answer 4

+1 A:

Normalization is a set of rules that used to design tables that connected through relationships.

It helps in avoiding repetitive entries, reducing required storage space, preventing the need to restructure existing tables to accommodate new data, increasing speed of queries.

First Normal Form: Data should be broken up in the smallest units. Tables should not contain repetitive groups of columns. Each row is identified with one or more primary key. For example, There is a column named 'Name' in a 'Custom' table, it should be broken to 'First Name' and 'Last Name'. Also, 'Custom' should have a column named 'CustiomID' to identify a particular custom.

Second Normal Form: Each non-key column should be directly related to the entire primary key. For example, if a 'Custom' table has a column named 'City', the city should has a separate table with primary key and city name defined, in the 'Custom' table, replace the 'City' column with 'CityID' and make 'CityID' the foreign key in the tale.

Third normal form: Each non-key column should not depend on other non-key columns. For example, In an order table, the column 'Total' is dependent on 'Unit price' and 'quantity', so the 'Total' column should be removed.

lanlantu 2010-02-26 16:27:49

Answer 5

+4 A:

This is not a thorough explanation, but one goal of normalization is to allow for growth without awkwardness.

For example, if you've got a user table, and every user is going to have one and only one phone number, it's fine to have a phonenumber column in that table.

However, if each user is going to have a variable number of phone numbers, it would be awkward to have columns like phonenumber1, phonenumber2, etc. This is for two reasons:

If your columns go up to phonenumber3 and someone needs to add a fourth number, you have to add a column to the table.
For all the users with fewer than 3 phone numbers, there are empty columns on their rows.

Instead, you'd want to have a phonenumber table, where each row contains a phone number and a foreign key reference to which row in the user table it belongs to. No blank columns are needed, and each user can have as few or many phone numbers as necessary.

Nathan Long 2010-02-26 17:04:07

Answer 6

A:

I've read the wiki links on normalization many times but I have found a better overview of normalization from this article. It is a simple easy to understand explanation of normalization up to fourth normal form. Give it a read!

Preview:

What is Normalization?

Normalization is the process of efficiently organizing data in a database. There are two goals of the normalization process: eliminating redundant data (for example, storing the same data in more than one table) and ensuring data dependencies make sense (only storing related data in a table). Both of these are worthy goals as they reduce the amount of space a database consumes and ensure that data is logically stored.

http://databases.about.com/od/specificproducts/a/normalization.htm

Gnatz 2010-04-23 17:04:53

Answer 7

+2 A:

One side point to note about normalization: A fully normalized database is space efficient, but is not necessarily the most time efficient arrangement of data depending on use patterns.

Skipping around to multiple tables to look up all the pieces of info from their denormalized locations takes time. In high load situations (millions of rows per second flying around, thousands of concurrent clients, like say credit card transaction processing) where time is more valuable than storage space, appropriately denormalized tables can give better response times than fully normalized tables.

For more info on this, look for SQL books written by Ken Henderson.

dthorpe 2010-04-23 17:16:36

Answer 8

A:

+1 for the analogy of talking to your wife. I find talking to anyone without a tech mind needs some ease into this type of conversation.

but...

To add to this conversation, there is the other side of the coin (which can be important when in an interview).

When normalizing, you have to watch how the databases are indexed and how the queries are written.

When in a truly normalized database, I have found that in situations it's been easier to write queries that are slow because of bad join operations, bad indexing on the tables, and plain bad design on the tables themselves.

Bluntly, it's easier to write bad queries in high level normalized tables.

I think for every application there is a middle ground. At some point you want the ease of getting everything out a few tables, without having to join to a ton of tables to get one data set.

Kris.Mitchell 2010-04-23 17:30:41

Answer 9

+1 A:

One-to-many relationships should be represented as two separate tables connected by a foreign key. If you try to shove a logical one-to-many relationship into a single table, then you are violating normalization which leads to dangerous problems.

Say you have a database of your friends and their cats. Since a person may have more than one cat, we have a one-to-many relationship between persons and cats. This calls for two tables:

Friends
Id | Name | Address
-------------------------
1  | John | The Road 1
2  | Bob  | The Belltower


Cats
Id | Name   | OwnerId 
---------------------
1  | Kitty  | 1
2  | Edgar  | 2
3  | Howard | 2

(Cats.OwnerId is a foreign key to Friends.Id)

The above design is fully normalized and conforms to all known normalization levels.

But say I had tried to represent the above information in a single table like this:

Friends and cats
Id | Name | Address       | CatName
-----------------------------------
1  | John | The Road 1    | Kitty     
2  | Bob  | The Belltower | Edgar  
3  | Bob  | The Belltower | Howard

(This is the kind of design I might have made if I was used to Excel-sheets but not relational databases.) A single-table approach forces me to repeat some information if I want the data to be consistent. The problem with this design is that some facts, like the information that Bob's address is "The belltower" is repeated twice, which is redundant, and makes it difficult to query and change data and (the worst) possible to introduce logical inconsistencies.

Eg. if Bob moves I have to make sure I change the address in both rows. If Bob gets another cat, I have to be sure to repeat the name and address exactly as typed in the other two rows. E.g. if I make a typo in Bob's address in one of the rows, suddenly the database has inconsistent information about where Bob lives. The un-normalized database cannot prevent the introduction of inconsistent and self-contradictory data, and hence the database is not reliable. This is clearly not acceptable.

Normalization cannot prevent you from entering wrong data. What normalization prevents is the possibility of inconsistent data.

It is important to note that normalization depends on business decisions. If you have a customer database, and you decide to only record a single address per customer, then the table design (#CustomerID, CustomerName, CustomerAddress) is fine. If however you decide that you allow each customer to register more than one address, then the same table design is not normalized, because you now have a one-to-many relationship between customer and address. Therefore you cannot just look at a database to determine if it is normalized, you have to understand the business model behind the database.

JacquesB 2010-07-25 14:34:24

Great, great!!!

Arlen Beiler 2010-07-27 17:52:43

ansaurus

tags:

views:

answers:

Normalization in plain English

related questions