views:

407

answers:

6

Is it better if I use ID nr:s instead of VARCHARS as foreign keys? And is it better to use ID nr:s isntead of VARCHARS as Primary Keys? By ID nr I mean INT!

This is what I have now:

category table:
cat_id ( INT ) (PK)
cat_name (VARCHAR)

category options table:
option_id ( INT ) (PK)
car_id ( INT ) (FK)
option_name ( VARCHAR ) 

I COULD HAVE THIS I THINK:

category table:
cat_name (VARCHAR) (PK)

category options table:
cat_name ( VARCHAR ) (FK)
option_name ( VARCHAR ) ( PK )

Or am I thinking completely wrong here?

+1  A: 
astander
Thanks for the answer, please explain "CODE"... check this Q out to find out my intended use please: http://stackoverflow.com/questions/2100008/can-this-mysql-db-be-improved-or-is-it-good-as-it-is
Camran
Even if a key does change it shouldn't be a problem if your database is correctly set up to cascade changes.
Tony
@Tony I don't think I've ever come across a situation where I would want to cascade a change like that. In fact, I've never used a database where we've had cascade switched on :-)
MatthieuF
@MatthieuF: True, keys (hopefully!) don't change but I guess you might have to deal with it in one way or another if they did.[putting my neck on the line here] I prefer to use natural keys, hence I've had to think about this possibility :)
Tony
+7  A: 

The problem with VARCHAR being used for any KEY is that they can hold WHITE SPACE. White space consists of ANY non-screen-readable character, like spaces tabs, carriage returns etc. Using a VARCHAR as a key can make your life difficult when you start to hunt down why tables aren't returning records with extra spaces at the end of their keys.

Sure, you CAN use VARCHAR, but you do have to be very careful with the input and output. They also take up more space and are likely slower when doing a Queries.

Integer types have a small list of 10 characters that are valid, 0,1,2,3,4,5,6,7,8,9. They are a much better solution to use as keys.

You could always use an integer-based key and use VARCHAR as a UNIQUE value if you wanted to have the advantages of faster lookups.

Atømix
But the problem with numeric identifiers is they are hard to read and you always need to join to other tables to obtain their meaning when looking at your data. Imagine a table that relates three (or more) other tables with all IDs as integers? You have three columns of numbers which have no immediate meaning.
Tony
@Tony: Immediately readable data typically means denormalized data.
OMG Ponies
@Tony. I know what you mean, however, that's what front-ends are for. They are for reading the data. The Database Structure should probably be created for efficiency, ease of maintenance, robustness, and scalability. Readability can be done with some Quick Views if you need it while developing. While it can be tempting to give every key a readable name, in any sizeable database, you will end up with problems trying to make every key readable. Not everything that CAN be done, should be done.
Atømix
@OMG Ponies: I take your point, but it's not always an immediate indication of denormalized data.In the example @astander gave (Stock Exchange company codes) it makes perfect sense to use a CHAR key for the company. They are all unique but also quite readable.
Tony
@Atomiton: Yes, front ends are there to make things easier but can you honestly say you have done all your database problem solving through a front end other than the one provided by the database? :)
Tony
@Tony: I said *typically*, not *always*. Using a VARCHAR should be because there is a natural key, while an integer would be an artificial key because as you pointed out - there's no implicit relationship between the number and the data.
OMG Ponies
@Tony: Why would you look at the key columns? You know they exist, and you know you can join on them. My limited experience is that pure join tables like you've mentioned are rare, and it's easy to write a query to display them in a more understandable form.
David Thornley
@OMG Ponies: I had noticed I'd missed the word "typically", sorry about that. I tried to edit my comment but it's locked now.
Tony
@David Thornley: You look at them because they identify your data. If you were looking for info using a company stock code (to continue the example) why not use the code directly, rather than having to look up a number to then use that to find the data you actually want? As I predicted this is rapidly becoming a discussion about natural vs surrogate keys and it's all my fault... :)
Tony
@Tony: No worries. My main point was, beyond what's already provided, why VARCHAR vs Int would be chosen when data modelling.
OMG Ponies
@Tony: It's not your fault; the whole question is tied up with natural vs. surrogate keys. For a natural key, you use VARCHAR when approriate. For a surrogate key, you'd always use an integer or something. I just seem to be more on the surrogate-key side than you are.
David Thornley
@All. Sure, a company stock code, for example, would make a great readable unique primary key. After all, it should ALWAYS be a unique way to represent a product, right? However, sometimes we can't predict how our data will be used in the future. One scenario ( there are many ) is this: You discontinue an old product and ( accidentally? ) reuse the stock code in the future. You now have no [easy] way to isolate the old from the new data. Note that the point isn't the example, it's that allowing the computer to automate the selection of unique keys future proofs your data better.
Atømix
+2  A: 

with an int you can store up to 2 billion in 4 bytes with varchars you cannot you need to have 10 bytes or so to store that, if you use varchars there is also a 2 byte overhead

so now you add up the 6 extra bytes in every PK and FK + the 2 byte varchar overhead

SQLMenace
Yeah, I mean, when it comes down to it, each character in a VARCHAR is a binary number to a processor. It just has to do the work of translating it using an ASCII lookup table for each letter. "A" == 65 == 00000000 01000001. That's stored for each letter. In fact, if you're using words as Keys, you'd run out of English words LONG before you used the possible number combinations in just TWO BYTES.
Atømix
I don't think the speed of queries is particularly affected by the use of VARCHAR. A database I work on uses CHAR primary keys on some tables with over 4 million records and queries execute in under a second. IU don't think a surrogate integer key would make a lot of difference.
Tony
@Tony char doesn't have the 2 byte overhead of a varcharalso my tables are in the billion row ranges so 2 bytes extra per key is 7 GB extra I have to pay for SAN space just for one table...then backups take longer etc etcwhen you have large table you need to optimize a lot more than with a couple of million rows
SQLMenace
@SQLMenace: Fair point. The database I have at the moment is the largest I've worked with so yes, I can see those extra couple of bytes would cause a problem.Also, thinking about it I don't have any VARCHAR keys, only CHAR! :)
Tony
Overhead is still overhead. It may not be that significant, but unless there's a good reason, it still isn't a great idea, unless there's a good business reason.
Atømix
+1  A: 

There's nothing wrong with either approach, although this question might start the usual argument of which is better: natural or surrogate keys.

If you use CHAR or VARCHAR as a primary key you'll end up using it as a forign key at some point. When it comes down to it, as @astander says, it depends on your data and how you are going to use it.

Tony
Lol. Sometimes I wonder if, in the future, wars will be started over issues like these. You're right, though, it always depends on the data and how it will be used. Personally, I've been burned too many times using non-integer types as a PK, so I put up with the "non-human-readableness" of PKs that don't relate directly to the record.
Atømix
+1  A: 

If you make the category name into the ID you will have a problem if you ever decide to rename a category.

SorcyCat
+2  A: 

When I'm doing design work I ask myself: have I got anything in this data that I can guarantee is going to be non-NULL, unique, and unchanging? If so that's a candidate to be the primary key. If not, I know I have to generate a key value to use. Assuming, then, that my candidate key happens to be a VARCHAR I then look at the data. Is it reasonably short in length (meaning, say, 20 characters or less)? Or is the VARCHAR field rather long? If it's short it's usable as a key - if it's long, perhaps it's better to not use it as a key (although if it's in consideration for being the primary key I'm probably going to have to index it anyways). At least part of my concern is that the primary key is going to have to be indexed and will perhaps be used as a foreign key from some other table. Comparisons of VARCHAR fields tend to be slower than the comparison of numeric fields (particularly binary numeric fields such as integers) so using a long VARCHAR field as a key may result in slow performance. YMMV.

Bob Jarvis