views:

1282

answers:

4

I have many tables that use Lookup/Enum references for most of their column values. For example:
Person Table - PersonID | RaceCode | HairColorCode | HairStyleCode | TeethConditionCode
Location Table - LocationID | SizeCode | ExteriorColorCode | ConditionCode
Things like Race,Size,Color,Condition,etc would just be foreign key references to a Code lookup table. This code table has other fields but aren't important for my question. The database is for a SaaS application which means each client could have their own list of of Colors, Races, Conditions, etc. There are some codes that would be static which clients couldnt change.

Is it better to have 1 code table or 2 types of code tables (DynamicCodeTable for customer defined ones and StaticCodeTable for those that one change) or should I have a table for each code type (RaceCodeTable, HairColorTable, Condition, etc) ?

The thing I am the most worried about is all the sql joins. The Person table I am working with has 20+ of these code attributes. Is there a difference in performance when joining to 20 different tables VS joining to the same table 20 times? Having multiple tables means each table would be smaller and the lookup 'should' take less time. But having a single table could be quick too. Any suggestions?

A: 

I made a mistake of thinking all these lookup tables would be a great idea when redesigning our pretty wide tables. So much flexibility, etc but it ended up being much harder to code for, it was impossible to navigate around, and it was just a pain in the ass.

So what did I learn?

  • for static values, just use an enum - it's a lot faster and more convenient. This decision has to be made depending on how many other tables may refer to the same variable.
  • stick with fewer lookup tables rather than creating as many as you can think of. JOINs are much slower.
  • to help yourself navigate around, design database VIEWs. It will make your life a lot easier.
  • as a bonus, if you don't want your clients touching certain tables (i.e. your static ones), or touching enum column values, you can use the MySQL (for example) fine-grained permissions to disable changes to certain columns in certain tables. A lot of people don't realize how flexible these permissions can get.
Artem Russakovskii
My gripe with this: if you use enums only, then they're part of your app only. This means 1) you need to release a new version every time something in your lookup values changes, and 2) you have no way on the database to enforce integrity (or you have to "kludge" your way around with messy CHECK constraints). Therefore I would argue to use lookup tables for ALL the lookup values beyond just a true/false field.
marc_s
Or define your lookup tables, with referential integrity as normal, but generate your enum definitions from the database. That way you program against the enums and they match the DB.
GalacticCowboy
+6  A: 

Without knowing more about the application or requirements I would recommend having one table for each code type. IMO the database design would be more clear and self documenting to have foreign keys for each type of code you have.

JD
A: 

There's a potential performance difference.

A table with just 2 rows ties up a lot of space in cache for those two tiny rows.

If you have a lot of lookup values in a single table, you -- effectively -- pack those values more densely into cache.

S.Lott
Each lookup table would be larger than that. Each client could have their own set of HairColor codes. So, each client could have their own 10 colors, 10 conditions, 10 sizes. The question is do I put these 30 codes into one table, or into three? These numbers are for one customer, and ideally we would have many. So a hundred customers could have their own set of 10 codes for each attribute.
Vyrotek
I don't agree at all - if a table has only two columns, say id and value, then a lot more rows will fit on any given 8k page. I don't see how you would waste memory that way. I would argue that it's a cleaner, more "discoverable" design to have separate, distinct lookup tables, especially for lookup values that might change between releases, or that need to be changed by the end user at any given time.
marc_s
@marc_s: a lot of rows "can" fit on an 8k page. If you only have two rows in the lookup, then those two rows are on that page, along with NOTHING else. Effectively wasting a bunch of cache space.
S.Lott
+9  A: 

This topic has been discussed at length over the last fifteen years, under the subject "One True Lookup Table" (abbreviated OTLT). The advantages of such an approach leap out to the database newbie. The drawbacks emerge over time. See this link for one discussion of the drawbacks of OTLT:

http://tonyandrews.blogspot.com/2004/10/otlt-and-eav-two-big-design-mistakes.html

Or google "OTLT" for more discussions.

If you create many lookup tables, and many maintenance screens for them, you can create a view that simulates the OTLT by creating a gigantic UNION that includes every code, every description, and the name of the table where the code-description pair is stored. It's possible to generate such a union using semiautomatic methods, if you know what you're doing. I would imagine that semiautomatic methods would enable you to build a single maintenance screen for hundreds of lookup tables, and then put some logic between that screen and the tables that would insert a new code in the correct table.

As to letting the users introduce new code TYPES, and not just new code VALUES, that opens a whole big can of worms. See the above article discussing EAV. This is very seductive, because it allows the users to design their own underlying data structure. If you disregard performance, this works pretty well for a while. You get a perfectly general database without having to learn the data structure from the users or the subject matter experts.

When it runs into real grief is when you try to use the data as if it were an integrated database, and not just a hodge podge of disjointed opinions about the data. At this point, you're into some serious data archaeology, when your customers expect routine report generation. Good luck.

(Editted to change "data mining" to "data archaeology")

Walter Mitty