views:

74

answers:

4

I am refactoring an old Oracle 10g schema to try to introduce some normalization. In one of the larger tables, there is a text field that has at most, 10-15 possible values. In my mind, it seems that this field is an example of unnecessary data duplication and should be extracted to a separate table.

After examining the data, I cannot find one relevant piece of information that could be associated with that text value. Basically, if I pulled that value out and put it into its own table, it would be the only field in that table. It exists today as more of a 'flag' field. Should I create a two-column table with a surrogate key, keep it as it is, or do something entirely different? Am I doing more harm than good by trying to minimize data duplication on this field?

A: 

If it's really a free-entry text field that's not re-used somewhere else in the database, and there's just a single field without repeated instances, I'd probably go ahead and leave it as it is. If you're determined to break it out I'd create a 'validation' table with a surrogate key and the text value, then put the surrogate key in the base table.

Share and enjoy.

Bob Jarvis
+3  A: 

You might save some space by extracting the column to a separate table. This is called a lookup table. It can give you a couple of other benefits:

  • You can declare a foreign key constraint to the lookup table, so you can rely on the column in the main table never having any value other than the 10-15 values you want.

  • It's easy to query for a concise list of all permitted values, by querying the lookup table. This can be faster than using SELECT DISTINCT on the main table's column. It also returns values that are permitted, but not currently used in the main table.

  • If you change a value in the lookup table, it automatically applies to all rows in the main table that reference it.

However, creating a lookup table with one column is not strictly normalization. You're just replacing one value with another. The attribute in the main table either already supports a normal form, or not.

Using surrogate keys (vs. natural keys) also has nothing to do with normalization. A lot of people make this mistake.

However, if you move other attributes into the lookup table, attributes that depend only on the lookup value and therefore would create repeating groups (violating 3NF) in the main table if you left them there, then that would be normalization.

Bill Karwin
@Bill, how about in my answer where the other values on the lookup table are just meta data for the lookup values -- does that "create" a normalization situation where one did not exist before?
Hogan
@Bill, also does the situation change if the lookup table is referenced from more than one main table?
Hogan
@Hogan: If the lookup values have associated attributes of their own, then keeping them in the main table would create repeating groups, so yes, moving them to the lookup table helps comply with 3NF. I don't think it matters with respect to *normalization* per se that multiple referencing tables are involved. It seems like a good idea to reduce redundancy, though.
Bill Karwin
A: 

If you want normalization break it out.

I think of these types of data in DBs as the equivalent of enums in C,C++,C#. Mostly you put them in the table as documentation.

I often have an ID, Name, Description, and auditing columns for them (eg modified by, modified date, create date, create by, active.) The description field is rarely used.

Example (some might say there are more than just 2)

Gender
ID  Name   Audit Columns...
1   Male
2   Female

Then in your contacts you would have a GenderID column which would link to this one.

Of course you don't "need" the table. You could have external documentation somewhere that says 1=Male, 2=Female -- but I think these tables serve to document a system.

Hogan
A: 

Are these 10-15 values actually meaningful, or are they really just flags? If they're meaningful pieces of text and it seems wasteful to replicate them, then sure create a lookup table. But if they're just arbitrary flag values, then your new table will be nothing more than a mapping from one arbitrary value to another, and not terribly helpful.

A completely separate question is whether all or most of the rows in your big table even have a value for this column. If not, then indeed you have a good opportunity for normalization and can create a separate table linking the primary key from your base table with the flag value.

Edit: One thing. If there's some chance that one of these "flag" values is likely to be wholesale replaced with another value at some point in the future, that would be another good reason to create a table.

Dan