For example, a website like stackoverflow.com, is it a good practice to use email address to identify users in many tables?
Is it bad if the primary key is very long, say
varchar(50)
or even
varchar(100)
?
For example, a website like stackoverflow.com, is it a good practice to use email address to identify users in many tables?
Is it bad if the primary key is very long, say
varchar(50)
or even
varchar(100)
?
No. First off, what if the same user asks two questions? If email were a primary key, we now have a PK violation.
Second, it shouldn't even be part of a composite key. What if a user changes their email address? Then you have an ugly cascade of changes that need to be made across your tables.
Third, you should just use something like an auto-incrementing ID. A string (like an email address) would be horribly inefficient.
If you need to tie a question to a particular member, have a memberID
foreign key into a member
table. The answers table should have its own auto-incrementing ID with a questionID
foreign key into the question
table and a memberID
foreign key into the member
table representing the member who provided the answer. Etc.
By the way, you might want to learn about database normalization, at least up to third normal form (3NF). This is not wankery, it's just good common sense.
Not really. For any sizable data set, you'll end up wasting a lot of space and you'll take a performance hit when querying. In addition, if someone changes their e-mail (which you might or might not allow), you've got to change it everywhere.
A surrogate key to uniquely identify the user would be a much better choice.
This post by Jay Pipes on comparing the differences between an int and a char for a primary key may help in understanding why integers should be used.
No, it's a bad idea. Emails change, and string comparisons are relatively expensive.
Surrogate keys are best. Natural keys are for textbooks. Natural keys have caused serious problems on every system where I have seen them used. Even national ID numbers are not unique enough.
If you have your columns indexed correctly, most modern databases (Oracle, Postgres, SQLServer) will not punish you excessively for joining in an email address. If you are worried about the joins, create a denormalized materialized view and pay the price on insert/update.
In addition to all the perf reasons why you don't want a string as primary key in tables, there are also several very specific reasons why email in particular should not be used as a primary key:
Primary keys have to be unique. However, normalizing the email address is hard. You might have a lot of problems enforcing the uniqueness. (Are email addresses case sensitive? Do you ignore . or + inside emails? How do you compare non-english emails?)
Email is personally identifiable information. Using it for any purpose can be a security and privacy problem. Especially if some of your users are under 13 years.
Email is not immutable, as should not be used as an identity representation (Should I use a number or an email id to identify a user on website?). Thus, if the user changes their email, you have to either a) update the primary keys of all your tables, or b) maintain the old email just as a key, which makes using the email as a key useless to begin with.