views:

495

answers:

9

Exactly how far do you normalize the example below, and exactly which level of normalization does this example meet?

CREATE TABLE "public"."contact_info" (
  "id" SERIAL, 
  "home_phone" TEXT, 
  "mobile_phone" TEXT, 
  "work_phone" TEXT, 
  "fax_phone" TEXT, 
  "email" TEXT, 
  "line1" TEXT, 
  "line2" TEXT, 
  "city" TEXT, 
  "state_id" INTEGER, 
  "zipcode" TEXT, 
  "preferred_type" TEXT, 
  "first_name" TEXT, 
  "last_name" TEXT,
  CONSTRAINT "contact_info_pkey" PRIMARY KEY("id"), 
  CONSTRAINT "contact_info_fk_state_id" FOREIGN KEY ("state_id")
    REFERENCES "public"."states"("id")
    ON DELETE NO ACTION
    ON UPDATE NO ACTION
    NOT DEFERRABLE
) WITH OIDS;

It should be noted that each record can have only 0 or 1 home, mobile, work, or fax number records. Each phone number is fully functionally dependent upon my primary key. From what I know about database normalization, I think this meets 5NF, but fails 6NF.

Since the phone number definitions are set in stone (no new phone number types, no multiple home_phone entries, etc.), is there any reason I would want to split the phone numbers into a separate relation tracked by (contact_info_id, phone_number, type)?

+1  A: 

I would not normalize it. Nothing much to be gained. we aren't running databases on 8-bit computers with 8 mb of ram and a 32mb hard-drive anymore. This is specific to this example. Complex data structures should be normalized.

The only real space saver is the phone number, and its on average 10-char wide, if you normalize that out and replace it with a bigint-foreign-reference you don't seem to gain much but runtime joins :P, and on top of that you need a int that represents what type the phone number is.

Hassan Syed
A: 

One thing that I would point out is that using TEXT columns is a bit of a waste, why not VARCHAR?

An advantage to normalization would be a reduction in duplication of code for validation. Another place where you anticipate future variation is if you needed to add a second address (we have work phone here, but address is assumed to be home address?).

MattMcKnight
Actually, in a PostgreSQL database, using a text column is *not* a waste -- it's the varchar type that produces overhead. However, in MySQL, that statement is reversed. The most efficient column type is highly database-dependent.
Kenaniah
+1  A: 

A general rule of thumb on table design is: if you expect your table to contain a high percentage of NULL values for some columns, then these columns should probably be in separate tables. Anyway a denormalized design may be ok if you don't expect to have performance problems and it helps you when maintaining the database.

Konamiman
A: 

is there any reason I would want to split the phone numbers into a separate relation tracked by (contact_info_id, phone_number, type)?

The likelihood of business rules changing to allow for multiple number types to be associated to a specific person. If you normalize the table, you'd only have to tweak constraints & there's be little to no changes in supporting code to accommodate the change. In current form, you'd have a major data model change and subsequent application changes - very expensive.

I'd like to point out that address (lines 1 & 2, city, state & zipcode) could also be normalized to support storing a persons home, business addresses. I could have numerous summer homes...

OMG Ponies
The reason that the address fields were not abstracted further is because in the design of our application, all of our address fields are fully functionally dependent on the contact_info PK. There's no need to abstract it further, since we do not (and will not) have a need for multiple addresses per contact info record.
Kenaniah
Address or phone number, the rationale is the same. Making a robust data model is a worthy investment.
OMG Ponies
Since there will not be any additional mailing addresses linked to a contact_info record, I don't see the point in further abstracting the mailing address. 1:1 means 1:1. Doing so will merely add to the overhead of runtime joins and query complexity with absolutely no gain given this context.
Kenaniah
I have two mailing addresses - home and work. In your current model, you could only store one of those and you wouldn't know which that it is. Every order entry system worth it's salt stores a billing address, and a shipping address.
OMG Ponies
@OMG, I agree with you. I'm merely stating that there's no need for it in this application. We will only have one address per contact_info record. That is by design. If I was developing an application that was not fully spec'd out, I would most likely abstract the address to allow for a 1:n relationship.
Kenaniah
+1  A: 

Given you say:

It should be noted that each record can have only 0 or 1 home, mobile, work, or fax number records.

... then your table design is a good match as it tightly binds all of these data values to this contact record.

Depending on the application, I could propose a more flexible design where a contact could have multiple addresses of different types, multiple phone numbers of different types and multiple email addresses for different purposes and even multiple phone numbers and/or email addresses to an address.

But then I've worked on several different applications only one of which seriously needed such flexibility (and no, we didn't actually get there): it was records for a school. Each contact record represented a single person and could be any or all of a student (or ex-student), a parent, a teacher, a login-id or some other person of interest. And, yes, we had one school where we very nearly had people in all of those categories!

However, in another application, we did not need this diversity: a software license was allocated to an email address and thus a person. There was one mailing address associated with the purchase and by extension the person and that was pretty much it. We aggregated licenses by email address (one person with multiple licenses) and later added a way to aggregate people with multiple email addresses (one person with multiple email addresses). Physical addresses were more tightly bound to billing information.

staticsan
A: 

Why not put a seperate Table with 3 fields: id, *type_of* and number. Then you could get rid of your *_phone stuff. Something like:

id      type_of      number
1       home         222 11 22
1       work         312 12 12
2       mobile       345 23 23
2       home         233 65 23
2       work         945 30 19

I would sugest to do the same with email, because the person could have an email at home and one at work and an other some where else.

+1  A: 

Because you have defined this system to have only 0 or 1 telephone numbers of each type, your system is meeting a higher level of normalization than a more accurate system would have.

Personally, I would store phone numbers in a separate table with id, public id, type, area code, exchange, suffix and extension. If you ever need to select all with a certain one of these values, it would be much more painful to do programatically in SQL.

In the same way, you have chosen to break the person's name into first and last parts (however neglecting prefix, middle name and suffix). If you can break one column into logical parts, one could argue that you have not passed BCNF.

The choices you'll have to make regarding how far you want to normalize your table depend mainly on how you choose to access the data. For most people, breaking a column like phone numbers into multiple parts is overkill, but if any of my above points seem useful for your needs, then consider it.

PS. What are line1 and line2?

Norla
line1 and line2 have to do with the street portion of the mailing address.
Kenaniah
A: 

I'd split out the phone numbers - for all the reasons mentioned by others, PLUS - if your users ever want to search by phone number, it would be a real hassle to write the sql with this design (assuming you wanted to search across multiple types of phone numbers).

In a "contact info" table, it's very likely that users may want to do this type of search.

Sylvia
A: 

One could argue that zipcode defines (city, state_id), so those should be normalized as dependent columns. However, unless you are writing the app for USPS, this is typically not done. Same with your original question - having 4 phone records doesn't hurt too much, I'd keep the design this way.

SergeyKazachenko