views:

567

answers:

2

What is a KISS (Keep it Simple, Stupid) way to remember what Boyce-Codd normal form is and how to take a unnormalized table and BCNF it?

Wikipedia's info: not terribly helpful for me.

+3  A: 

Here are some helpful excerpts from the Wikipedia page on Third Normal Form:

Bill Kent defines Third Normal Form this way:

Each non-key attribute "must provide a fact about the key, the whole key, and nothing but the key."

Requiring that non-key attributes be dependent on "the whole key" ensures that a table is in 2NF; further requiring that non-key attributes be dependent on "nothing but the key" ensures that the table is in 3NF.

Chris Date adapts Kent's mnemonic to define Boyce-Codd Normal Form:

"Each attribute must represent a fact about the key, the whole key, and nothing but the key." Here the requirement is concerned with every attribute in the table, not just non-key attributes.

This comes into play when a table has multiple compound candidate keys, and an attribute within one candidate keys has a dependency on a part of another candidate key. Third Normal Form wouldn't prohibit this, because it excludes key attributes. But BCNF applies the rule to key attributes as well.

As for how to make a table satisfy BCNF, you need to represent the extra dependency, with another attribute and possibly by splitting attributes into another table.

Bill Karwin
+9  A: 

Chris Date's definition is actually quite good, so long as you understand what he means:

Each attribute

Your data must be broken into separate, distinct attributes/columns/values which do not depend on any other attributes. Your full name is an attribute. Your birthdate is an attribute. Your age is not an attribute, it depends on the current date which is not part of your birthdate.

must represent a fact

Each attribute is a single fact, not a collection of facts. Changing one bit in an attribute changes the whole meaning. Your birthdate is a fact. Is your full name a fact? Well, in some cases it is, because if you change your surname your full name is different, right? But to a genealogist you have a surname and a family name, and if you change your surname your family name does not change, so they are separate facts.

about the key,

One attribute is special, it's a key. The key is an attribute that must be unique for all information in your data and must never change. Your full name is not a key because it can change. Your Social Insurance Number is not a key because they get reused. Your SSN plus birthdate is not a key, even if the combination can never be reused, because an attribute cannot be a combination of two facts. A GUID is a key. A number you increment and never reuse is a key.

the whole key,

The key alone must be sufficient [and necessary!] to identify your values; you cannot have the same data represented by different keys, nor can a subset of the key columns be sufficient to identify the fact. Suppose you had an address book with a GUID key, name and address values. It is OK to have the same name appearing twice with different keys if they represent different people and are not the "same data". If Mary Jones in accounting changes her name to Mary Smith, Mary Jones in Sales does not change her name as well. On the other hand, if Mary Smith and John Smith have the same street address and it really is the same place, this is not allowed. You have to create a new key/value pair with the street address and a new key.

You are also not allowed to use the key for this new single street address as a value in the address book since now the same street address key would be represented twice. Instead, you have to make a third key/value pair with values of the address book key and the street address key; you find a person's street address by matching their book key and address key in this group of values.

and nothing but the key

There must be nothing other than the key that identifies your values. For example, if you are allowed an address of "The Taj Mahal" (assuming there is only one) you are not allowed a city value in the same record, since if you know the address you would also know the city. This would also open up the possibility of there being more than one Taj Mahal in a different city. Instead, you have to again create a secondary Location key with unique values like the Taj, the White House in DC, and so on, and their cities. Or forbid "addresses" that are unique to a city.

So help me, Codd.

Dour High Arch
+1 for elegant answer formatting.
bouvard
Informative answer, but do you think you could elaborate on the difference between BCNF and 3NF? This description sounds exactly like what I've read for 3NF
Graphics Noob
@Graphics, all BCNF tables are in 3NF, but the reverse is not true if a key has a dependency on another key. Bill Karwin's answer mentions this. My example is of an "address" that can exist in only certain cities. This is allowed in 3NF but not in BCNF. Detecting this requires knowing the meaning of the data, you cannot detect it from the schema. http://en.wikipedia.org/wiki/BCNF has an example of turning 3NF into BCNF.
Dour High Arch