views:

249

answers:

4

Consider that there is a bunch of tables which link to "countries" or "currencies" tables.

For making data easier to read I'd like make CHAR field with country code (eg US, GB, AU) and currency code (USD, AUD) a primary keys in each of those 2 tables and all other tables will use this CHAR as a foregin key.

Database is mysql with innodb engine.

Is it going to cause performance issues? Is it something i should avoid?

+1  A: 

This will help you:

http://forums.mysql.com/read.php?153,243809,243818#msg-243818

James Skidmore
Please don't just post links to other sources. Include some context around it. This is a canonical source of programming tidbits, and links rot.
Eric
Context?! We don't need no stinkin' context...
OMG Ponies
@Eric, sorry about that. I didn't have much time to help out, but I saw that link and figured it could at least be of assistance in addition to other answers. I'll only post if I can include some context in the future.
James Skidmore
+1  A: 

James Skidmore's link is important to read.

If you're limiting yourself to country and currency codes (2 and 3 characters, respectively), you may very well be able to get away with declaring the columns char(2) and char(3).

I would guess that would not be a no-no. If you're using an 8-bit character encoding, you're looking at columns the size of smallint or mediumint, respectively.

timdev
+15  A: 

Performance isn't really the main issue, at least not for me. The issue is more about surrogate vs natural keys.

Country codes aren't static. They can and do change. Countries change names (eg Ethiopia to Eritrea). They come into being (eg the breakup of Yugoslavia or the Soviet Union) and they cease to exist (eg West and East Germany). When this happens the ISO standard code changes.

More in Name Changes Since 1990: Countries, Cities, and More

Surrogate keys tend to be better because when these events happen the keys don't change, only columns in the reference table do.

For that reason I'd be more inclined to create country and currency tables with an int primary key instead.

That being said, varchar key fields will use more space and have certain performance disadvantages that probably won't be an issue unless you're performing a huge number of queries.

For completeness, you may want to refer to Database Development Mistakes Made by AppDevelopers.

cletus
Ethiopia changed it's name?!?
SeanJA
Damn the phone, made me get up in the middle of typing this exact answer. Well said!
Eric
@SeanJA: according to that link, yes. It might've been a temporary change.
cletus
+1 - Well explained.
James Black
Even more importantly, things that by definition ought to be unique, aren't (Social Security numbers, Passport numbers and so on)
Vinko Vrsalovic
well, if country code is renamed then you simply change it in one table and all tables are updated via trigger (or worst you have to do it yourself if triggers not supported) but this is minor issue
alexeit
@alexeit: If you use surrogate keys, you don't have to fire triggers that will update potentially millions of rows on a change.
Eric
Abyssinia became Ethiopia; Eritrea broke off from Ethiopia and is now a separate country.
Jonathan Leffler
A: 

My answer is that there isn't a clear-cut answer. Just pick an approach within your project and be consistent. Both have their pluses and minuses.

@cletus makes a good point about using generated keys, but when you run into a situation where the data is relatively static, like country codes, introducing a generated key for them seems overly complex. Despite real world politics, having country codes appear and disappear isn't really going to be much of an issue for most business problems (but if your data actively concerns all 190-210 countries, follow that advice).

Using surrogate keys universally is a good and popular strategy. But remember, it comes in response to modeling databases using natural keys for everything. Ack! Open up a 15 year old database book. Using natural keys everywhere definitely gets you into difficult situations, as initial understanding of the problem domains prove wrong. You do want to have consistency in your modelling practices, but using different techniques for clearly different situations is OK.

I suspect that performance for most modern databases on var(2) foreign keys will be the same (or better) than int fields. Databases have for years supported textual foreign keys.

Given that we have no other information about the project, if you preference is to use the country codes as foreign keys, and you have the option to do so, I'd say it's OK. It'll be easier to work with the data. It is a little against current practices, but-- in this case-- it's not going to back you into some corner.

ndp
-1 This is actually quite wrong. As pointed to in http://forums.mysql.com/read.php?153,243809,243818#msg-243818 by James, there are things MySQL (the database in question) doesn't do with varchars that it does with int keys.
cletus
I was just guessing that since databases did this for years, it would be optimized. Wouldn't be the first time that assumption proved wrong!But that post is a slightly different question ("is using VARCHAR(45) a good choice for a primary key?") This problem is CHAR(2) on a 200 row table (the number of countries). Unfortunately that post doesn't discuss FK index performance in general, and whether CHAR(2) is going to be more efficient than VARCHAR(2), and I couldn't dig it up.Thanks for the link.
ndp
i agree, it discusses how bad is to have 45 byte char since it is 4-5 time bigger then normal int, but with char(2) or char(3) there wont be much difference size wise.
alexeit