views:

296

answers:

2

In regard to static data table design. Having static data in tables like shown:

  • Currencies (Code, Name). Row example: USD, United States Dollar
  • Countries (Code, Name). Row example: DE, Germany
  • XXXObjectType (Code, Name, ... additional attributes)
  • ...

does it make sense to have another (INTEGER) column as a Primary Key so that all Foreign Key references would use it?

Possible solutions:

  1. Use additional INTEGER as PK and FK
  2. Use Code (usually CHAR(N), where N is small) as PK and FK
  3. Use Code only if less then certain size... What size?
  4. Other _______

What would be your suggestion? Why?

I usually used INT IDENTITY columns, but very often having the short code is good enough to show to the user on the UI, in which case the query would have one JOIN less.

+7  A: 

An INT IDENTITY is absolutely not needed here. use the 2 or 3 digit mnemonics instead. If you have an entity that has no small, unique property, you should then consider using a synthetic key. But currency codes and country codes aren't the time to do it.

I once worked on a system where someone actually had a table of years, and each year had a YearID. And, true to form, 2001 was year 3, and 2000 was year 4. It made everything else in the system so much harder to understand and query for, and it was for nothing.

Dave Markle
Thanks.I do have a table with DATEs (not YEARS), but I use DATE itself as a PK. There I store multiple pre-calculated attributes like DOW, IsWorkDay (not weekday), PreviousWorkDay etc...
van
+1. We do that where I work. I hate it. I have to join 4 or 5 tables just to figure out the actual data of a record unless I know that country id 43 means that the person lives in the U.S.....All for "Optimization"
Kevin
Depending upon the kind of lookup values you are using, I'd rather try to look for a standardized set of abbreviations. For example the ISO lists for Countries and currency codes. This can be quite intuitive to end users.
no_one
@no_one: ISO codes is what I use too. thanks
van
+1  A: 

If you use a ID INT or a CHAR, referential integrity is preserved in both cases.
An INT is 4 bytes long, so it's equal in size as a CHAR(4); if you use CHAR(x) where x<4, your CHAR key will be shorter than a INT one; if you use CHAR(x) where x>4, your CHAR key will be greater than a INT one; for short keys doesn't usually make sense to use VARCHAR, as the latter has a 2-byte overhead. Anyway, when talking about tables with - say - 500 records, the total overhead of a CHAR(5) over a INT key would be just 500 bytes, a value hilarious for database where some tables could have millions of records.
Considering that countries and currencies (e.g.) are limited in numbers (a few hundred, at most) you have no real gain in using an ID INT instead of a CHAR(4); moreover, a CHAR(4) key can be easier to remember for the end user, and can ease your life when you have to debug/test your Sql and/or data.
Therefore, though I usually use an ID INT key for most of my tables, in several circumstances I choose to have a PK/FK made of CHARs: countries, languages, currencies are amongst those cases.

Turro
Is there an issue with CHAR(N) columns when the value can be of length less then N? For example, most of the codes are 4 chars ('GOOD', 'COOL', ), but some may be shorter (like 'OK'). Could there be an issue when this data is retrieved it would contain leading/trailing spaces?
van
Since CHAR(N) are fixed in size, when you retrieve the 'OK' value you will actually get 'OK ' (if N=4). SqlServer appends trailing spaces, but this SHOULD not be an issue if you are consistent defining PK/FK: to be safe I usually create a User Data Type - e.g. udtCodeKey as CHAR(4) which I use both for primary and foreign keys. I never actually got in troubles, but maybe I've been lucky so far.
Turro
Only after clickng "Add comment" I noticed something which could look like a typo (but it is not): the second 'OK ' has two trailing spaces, like 'OK__', but the font didn't make it clear
Turro