views:

33

answers:

2

I have an industry lookup table: ID, Name.

I have other industry properties such as Industry sector, Industry service, Industry products, etc. These are all required properties for each industry so any industry being entered will have these data. These data are fixed list items like Industry sector = (Primary, Secondary, tertiary). On site, these values will be either auto-suggest or single select drop down list values. Also these will be used as search filters to further filter industries on site. And these will be used for reporting like -> Displaying count of companies belonging to primary sector industries only from people you are friends with.

For schema I see two ways it can be designed:
1) Industry lookup table will have all these additional data as text
2) The additional data will be stored as IDs which FK reference to their lookup tables.
3) Open to other design ideas too.

Issue with #1 is there will be no enforcement of data quality.
Issue with# 2 is there are many many fixed list items, so each having its own lookup table means there will be tons of lookup tables and FKs for the parent tables.

I am not sure in the real world of large scale systems how this is done. Industry is only one entity; I have many entities and each has at least 40-50 fixed list items (columns), so which way is better? For more info, this is a user content website - professional networking website so performance is important.

Suggestions?

+2  A: 

Go with Option 2: If you measure a performance impact of doing many joins, first make sure you have the right indexes for the query workload, and then if join performance is still an issue, possibly denormalise.

Mitch Wheat
A: 

These are all required properties for each industry so any industry being entered will have these data

But will this always be the case? Meaning, may properties be removed or added? This is a likely occurrence in most applications which means you'll be deleting/adding columns to make that happen. This should make you at least consider making these rows, rather than columns.

So I suggest #3:

Rather than having look up tables for each property, have only one. That way you have four tables total:

industries (id, name)
industry_property_names (id, name) // Contains the name of the property, e.g., Industry sector
industry_property_values (id, industry_property_name_id, name) // Primary, Secondary, tertiary
industry_properties ( (pk: industry_id, industry_property_name_id), industry_property_value_id)

Requires some code enforcement during data entry but properties will be dynamic and look ups will be relatively fast.

webbiedave