In cases of simple atomic values, I tend to disagree with the common wisdom on this one, mainly on the complexity front. Consider a table containing hats. You can do the "denormalized" way:
CREATE TABLE Hat (
hat_id INT NOT NULL PRIMARY KEY,
brand VARCHAR(255) NOT NULL,
size INT NOT NULL,
color VARCHAR(30) NOT NULL /* color is a string, like "Red", "Blue" */
)
Or you can normalize it more by making a "color" table:
CREATE TABLE Color (
color_id INT NOT NULL PRIMARY KEY,
color_name VARCHAR(30) NOT NULL
)
CREATE TABLE Hat (
hat_id INT NOT NULL PRIMARY KEY,
brand VARCHAR(255) NOT NULL,
size INT NOT NULL,
color_id INT NOT NULL REFERENCES Color(color_id)
)
The end result of the latter is that you've added some complexity - instead of:
SELECT * FROM Hat
You now have to say:
SELECT * FROM Hat H INNER JOIN Color C ON H.color_id = C.color_id
Is that extra join a huge deal? No - in fact, that's the foundation of the relational design model - normalizing allows you to prevent possible inconsistencies in the data. But every situation like this adds a little bit of complexity, and unless there's a good reason, it's worth asking why you're doing it. I consider possible "good reasons" to include:
- Are there other attributes that "hang off of" this attribute? Are you capturing, say, both "color name" and "hex value", such that hex value is always dependent on color name? If so, then you definitely want a separate color table, to prevent situations where one row has ("Red", "#FF0000") and another has ("Red", "#FF3333"). Multiple correlated attributes are the #1 signal that an entity should be normalized.
- Will the set of possible values change frequently? Using a normalized lookup table will make future changes to the elements of the set easier, because you're just updating a single row. If it's infrequent, though, don't balk at statements that have to update lots of rows in the main table instead; databases are quite good at that. Do some speed tests if you're not sure.
- Will the set of possible values be directly administered by the users? I.e. is there a screen where they can add / remove / reorder the elements in the list? If so, a separate table is a must, obviously.
- Will the list of distinct values power some UI element? E.g. is "color" a droplist in the UI? Then you'll be better off having it in its own table, rather than doing a SELECT DISTINCT on the table every time you need to show the droplist.
If none of those apply, I'd be hard pressed to find another (good) reason to normalize. If you just want to make sure that the value is one of a certain (small) set of legal values, you're better off using a CONSTRAINT that says the value must be in a specific list; keeps things simple, and you can always "upgrade" to a separate table later if the need arises.