views:

158

answers:

6

For example, a website like stackoverflow.com, is it a good practice to use email address to identify users in many tables?

Is it bad if the primary key is very long, say

varchar(50)

or even

varchar(100)

?

+6  A: 

No. First off, what if the same user asks two questions? If email were a primary key, we now have a PK violation.

Second, it shouldn't even be part of a composite key. What if a user changes their email address? Then you have an ugly cascade of changes that need to be made across your tables.

Third, you should just use something like an auto-incrementing ID. A string (like an email address) would be horribly inefficient.

If you need to tie a question to a particular member, have a memberID foreign key into a member table. The answers table should have its own auto-incrementing ID with a questionID foreign key into the question table and a memberID foreign key into the member table representing the member who provided the answer. Etc.

By the way, you might want to learn about database normalization, at least up to third normal form (3NF). This is not wankery, it's just good common sense.

Jason
If memberID is adopted in a website system, there would be many more join operations.
Steven
The general rule is to normalize until you run into performance problems, then denormalize as necessary to resolve those. Index-based joins are pretty efficient, especially if the foreign keys in the indices are nice, small surrogate keys.
tvanfosson
@Steven: If the foreign keys are designed appropriately and are indexed, it's faster than you might think.
Jason
@Jason: It's been a while since I've had to think about the theoretical aspects, so I might well be wrong, but I don't think using an e-mail address as the primary key for a user really violates third normal form since the e-mail address could be considered a candidate key.
James McNellis
+11  A: 

Not really. For any sizable data set, you'll end up wasting a lot of space and you'll take a performance hit when querying. In addition, if someone changes their e-mail (which you might or might not allow), you've got to change it everywhere.

A surrogate key to uniquely identify the user would be a much better choice.

James McNellis
If a surrogate is adopted in a website system, there would be many more join operations.
Steven
Those joins are generally very cheap in the grand scheme of things.
James McNellis
Also, the cost of the existing joins would be reduced, because they are joining on a 4-byte (or 8-byte) integer key instead of a 20-byte (or more) string key.
Ray Hidayat
A: 

This post by Jay Pipes on comparing the differences between an int and a char for a primary key may help in understanding why integers should be used.

Simplecoder
A: 

No, it's a bad idea. Emails change, and string comparisons are relatively expensive.

kyoryu
A: 

Surrogate keys are best. Natural keys are for textbooks. Natural keys have caused serious problems on every system where I have seen them used. Even national ID numbers are not unique enough.

If you have your columns indexed correctly, most modern databases (Oracle, Postgres, SQLServer) will not punish you excessively for joining in an email address. If you are worried about the joins, create a denormalized materialized view and pay the price on insert/update.

MattMcKnight
+2  A: 

In addition to all the perf reasons why you don't want a string as primary key in tables, there are also several very specific reasons why email in particular should not be used as a primary key:

  • Primary keys have to be unique. However, normalizing the email address is hard. You might have a lot of problems enforcing the uniqueness. (Are email addresses case sensitive? Do you ignore . or + inside emails? How do you compare non-english emails?)

  • Email is personally identifiable information. Using it for any purpose can be a security and privacy problem. Especially if some of your users are under 13 years.

  • Email is not immutable, as should not be used as an identity representation (Should I use a number or an email id to identify a user on website?). Thus, if the user changes their email, you have to either a) update the primary keys of all your tables, or b) maintain the old email just as a key, which makes using the email as a key useless to begin with.

Franci Penov
Good answer. It's entirely possible that 2 users share the same email address (husband and wife possibly?) so you can't be 100% sure they'll be unique.
DBMarcos99
People gain and lose email addys rather frequently. Some web apps also allow accounts to change email address and have multiple email addresses associated.
memnoch_proxy