ansaurus

Question

Normalize or Denormalize: Store Contact Details (Phone Numbers) in separate Table? Search Performance?

Answer 1

+5 A:

This might be fine now, but what happens when someone wants a fifth phone number? Do you keep on adding more and more fields?

Another thing to consider is how would you run a query to say 'Give me all people and their mobile phone numbers', or 'Give me everyone with no phone number'? With a separate table, this is easy, but with one table the mobile number could be in any one of four fields so it becomes much more complicated.

If you take the normalised approach, and in future you wanted to add additional data about the phone number, you can simply add another column to the phone numbers table, not add 4 columns to the contacts table.

Going back to the first point about adding more phone numbers in future - if you do add more numbers, you will have to amend probably every query/bit of logic/form that works on data to do with phone numbers.

Tom Haigh 2009-07-28 23:39:36

The number of phone numbers is less of a problem. Whether I do 4 or 6, the problem stays the same and it's OK for me limit to what's reasonable.Your second comment is exactly what I think about too (see the part about search in my question) - I'd have to look at 4 fields. I'd like to know how performance suffers when I do that. Especially as they would have to be full text indexed somehow (I can't limit numbers to US format or anything like that.)

Alex 2009-07-28 23:42:54

I'm not sure it will be so much of a performance problem, I would be more concerned about the complexity of the queries required to perform simple filtering of the data.

Tom Haigh 2009-07-28 23:46:53

All people and their mobile numbers would be solved by using the Code approach I've suggested in my answer below. You can optimize your indices based on the searches you intend to offer.Check out http://sqlblog.com/blogs/jonathan_kehayias/archive/2009/02/17/linq-to-sql-the-ongoing-developer-dba-debate.aspx and links on this page for guidelines.

MadMurf 2009-07-28 23:55:37

Answer 2

+8 A:

It won't likely cause problems to have the data denormalized like that, but I wouldn't suggest it. Even though it may be more complex to query, it's better to have well formed data that you can manipulate in a multitude of ways. I would suggest a database schema like this:

Contacts:
  ID (Primary Key)
  Name
  Job Title

Phone Number Categories:
  ID (Primary key)
  Name

Phone Numbers:
  ID (Primary Key)
  Category_ID (Foreign Key -> Phone Number Categories.ID)
  Contact_ID (Foreign Key -> Contacts.ID)
  Phone Number

This allows you a lot of flexibility in the number of phone numbers allowed, and gives you the ability to categorize them.

Mike Trpcic 2009-07-28 23:41:25

I agree with @Mike, as mentioned below, but would take this a step further and make the Phone Number Categories table suggested in his answer generic so you can re-use it for other code/description fields elsewhere in your application.

MadMurf 2009-07-29 00:01:47

Answer 3

A:

How about an XML field in the Contacts table? It takes away the complexity of another table.

(Please correct me if this is a bad idea, I've never worked with XML fields before)

ZippyV 2009-07-28 23:47:44

Typically you want any single column to only keep track of one property.

Rake36 2009-07-29 00:07:41

Answer 4

+3 A:

I'm in favor of the normalized approach. What if you decided that you wanted to add an "Extension" column for business phone numbers? You'd have to create the columns "Extension1", "Extension2", "Extension3", etc. This could become quite tedious to maintain at some point.

Then again, I don't think you can go too wrong either way. It's not like normalization/denormalization will take all that much time if you decided to switch to the other method.

Kevin Pang 2009-07-28 23:47:48

Answer 5

+1 A:

I agree with @Tom that normalizing makes more sense and provides flexibility. If you get your indexes right you shouldn't suffer too much by doing a join between the tables.

As for your normalized table I'd add a type or code field so you can ID say Home, Home 1,Home 2, Business, Bus 1, Bus 2, Mobile, Mob1 etc...

Id int PK
ContactId int FK<->Contacts.Id
Code char(5)
PhoneNumber nvarchar(22)
Description nvarchar(100)

And store this type in a separate table with say other code/code description information

We tend to have a Code Group table with info such as

CODE_GROUP, CODE DESC
ST          State
PH          Phone Number
AD          Address Type

And a CODE table with

CODE_ID, CODE_GROUP, DESCRIPTION
MB1      PH          Mobile One
MB2      PH          Mobile Two
NSW      ST          New South Wales   
etc...

You can expand this out to have long description, short desc, ordering, filtering etc

MadMurf 2009-07-28 23:49:11

Answer 6

+3 A:

An important principle of denormalization is that it does not sacrifice normalized data. You should always start with a schema that accurately describes your data. As such you should put different kinds of information in different kinds of tables. You should also put as many constraints on your data as you think is reasonable.

All of these goals tend to make queries a teeny bit longer, as you have to join different tables to get the desired information, but with the right names for tables and columns, this shouldn't be a burden from the point of view of readability.

More imporantly, these goals can have an affect on performance. You should monitor your actual load to see if your database is performing adequately. If nearly all of your queries are returning quickly, and you have lots of CPU headroom for more queries, then you're done.

If you find that write queries are taking long, make sure you don't denormalize your data. You will make the database work harder to keep things consistent, since it will have to do many reads followed by many more writes. Instead, you want to look at your indexes. Do you have indexes on columns you rarely query? Do you have indexes that are needed to verify the integrity of an update?

If read queries are your bottleneck, then once again, you want to start by looking at your indexes. Do you need to add an index or two to avoid table scans? If you just can't avoid the table scans, are there any things you could do to make each row smaller, like by reducing the number of characters in a varchar column, or splitting rarely queried columns into another table to be joined upon when they are needed.

If there is a specific slow query that always uses the same join, then that query might benefit from denormalization. First verify that reads on those tables strongly outnumber writes. Determine which columns you need from one table to add to the other. You might want to use a slightly different name to those columns so that it's more obvious that they are from denormalization. Alter your write logic to update both the original table used in the join, and the denormalized fields.

It's important to note that you aren't removing the old table. The problem with denormalized data is that while it accelerates the specific query it was designed for, it tends to complicate other queries. In particular, write queries must do more work to insure that the data remains consistent, either by copying data from table to table, by doing additonal subselects to make sure that the data is valid, or jump over other sorts of hurdles. By keeping the original table, you can leave all your old constraints in place, so at least those original columns are always valid. If you find for some reason that the denormalized columns are out of sync, you can switch back to the original, slower query and everything is valid, and then you can work on ways to rebuild the denormalized data.

TokenMacGuy 2009-07-29 00:33:35

ansaurus

tags:

views:

answers:

Normalize or Denormalize: Store Contact Details (Phone Numbers) in separate Table? Search Performance?

related questions