tags:

views:

28

answers:

3

I'm thinking of this, if I make a web site for a specific university's students should I make the ID as standard IDs on MySQL (integer, auto increment) or should I give the IDs as how is their email address is, like if the student's email address is [email protected] then his/her id is e12234 (on my web site). Is it ok, what about performance?

Edit: Also, there are such email addresses: [email protected] (exchange student) [email protected] (this is a professor)

A: 

Generally you'd want to map strings to ids and reference the ID eveywhere

CREATE TABLE `student` (
    `id` int unsigned NOT NULL auto_increment,
    `email` varchar(150) NOT NULL
    PRIMARY KEY  (`id`)
)

This will reduce the size of any table reference the email table as it will be using an INT instead of a VARCHAR.

Also if you used part of their email and the user ever changed their email you'd have to go back through every table and update their ID.

methodin
All of your arguments are valid, but why an extra table for the emails? He could use the ID as primarykey and just have a column for the E-Mail-Address. Valid database design, because its still in the third normal form. Your answer makes no sense to me. An extra table would be fine if a user could have multiple E-Mail-Adresses, but you are using the reference key as a primary key for the table and thus for, I don't see any performance gain in doing so, rather than a performance loss for a required join for getting the E-Mail-Address.
citronas
You're right. I mis-named the table. Should be 'student'. Fixed.
methodin
Changing email address is not possible because it is given only once to the student.
ilhan
Regardless, then, when using an int compared to a varchar of say max of 30 length you'd be using 4 bytes with an int compared to possibly 31 bytes for the varchar column. Plus is there any guarantee they won't ever re-use an email address?
methodin
Okay then, using int is more meaningful (because of the space required). The email address is same with the student's identification number, thus it is given only once to only one student. No re-use. If a student becomes professor then it might be a problem, thus I'm going to use int.
ilhan
+1  A: 

I would strongly recommend a separate, independent value for the id (integer, auto increment). Id values should never change, never be updated. People change their emails all the time, corporations reissue the same email address to new users sometimes.

bzarah
This is only for a specific university, thus changing email addresses is not possible.
ilhan
I can't count the number of times I've been told that a given requirement will NEVER EVER EVER EVER change, only to be told later that it has. :) Even if that's the case, I would still go with the separate ids. As @Wrikken and @methodin have pointed out, it will make the joins quicker. There really isn't a downside to going this way, is there?
bzarah
A: 

If an emailaddress is unique and static in your population (and make very sure it is), you may make it a primary key, and actually a full normalization would favor that option. There are however some pitfalls to consider:

  1. People change emailaddresses once in while. What if a student becomes a professor, or is harassed on his/hers emailaddress so he/she applied for a new address and got one? The primary key shold not change, ever, so there goes your schema.
  2. Sanitizing emailaddresses takes a little bit more effort then integers.
  3. Depending on how many foreign keys point to this ID, needed storage space could be increased, and joining on CHARs rather then INTs could suffer in performance (you should test that though)
Wrikken
Because of storage space and performance I give up.
ilhan