views:

236

answers:

8

What is your choice for primary key in tables that represent a person (like Client, User, Customer, Employee etc.)? My first choice would be an SSN number. However, using SSN has been discouraged because of privacy concerns and different regulations. SSN can change during person lifetime, so that is another reason against it.

I guess that one of the functions of well chosen natural primary key is to avoid duplication. I do not want a person to be registered twice in the database. Some surrogate or generated primary key does not help in avoiding duplicate entries. What is the best way to approach this?

EDIT:

What is the best way to guarantee uniqueness in your application for person entity and can this be handled on database level with primary key or uniqueness constraint?

+6  A: 

I don't know which Database engine you are using, but (at least with MySQL -- see 7.4.1. Make Your Data as Small as Possible), using an integer, the shortest possible, is generally considered best for performances and memory requirements.

I would use an integer, auto_increment, for that primary key.
The idea being :

  • If the PK is short, it helps identifying each row (it's faster and easier to compare two integers than two long strings)
  • If a column used in foreign keys is short, it'll require less memory for foreign keys, as the value of that column is likely to be stored in several places.

And, then, set a UNIQUE index on an other column -- the one that determines unicity -- if that's possible and/or necessary.


Edit: Here are a couple of other questions/answers that might interest you :

Pascal MARTIN
What would that column be and can you guarantee person entity uniqueness on database level?
Dan
What makes a person unique is a more complex question : not the name, not the address, not the birthdate, not... ;; Maybe a combinaison of some of those ? Like firstname + middle names (not sure how those are called in english) + lastname + birthdate + place of birth + gender ;; those are generally the ones used on "administrative" forms, here, and should do not too bad, I suppose ?
Pascal MARTIN
@dan the only time the guarantee breaks down is if you have multiple physical database servers and do not have the correct application logic (or replication) to keep the person synchronized. For this you need to provision for logic to keep the person-id a globally unique (note: I am not implying you should GUID's)
Hassan Syed
@pascal (+1) it is indeed complex, which is why a integer type is needed, especially one that can maintain global uniqueness.
Hassan Syed
+1  A: 

Use an autogenerated integer primary key, and then put a unique constraint on anything that you believe should be unique. But SSNs are not unique in the real world so it would be a bad idea to put a uniqueness constraint on this column unless you think turning away customers because your database won't accept them is a good business model.

Mark Byers
A: 

To add to @Mark and @Pascal (autoincrement integers are your best bet) -- SSN's are usefull and should be modelled correctly. Security concerns are part of application logic. You can normalize them into a separate table, and you can make them unique by providing a date-issued field.

p.s., to those who disagree with the `security in application' point, an enterprise DB will have a granular ACL model; so this won't be a sticking point.

Hassan Syed
A: 

I prefer natural keys, but a table person is a lost case. SSNs are not unique and not everybody has one.

just somebody
+2  A: 

As mentioned above, use an auto-increment as your primary key. But I don't believe this is your real question.

Your real question is how to avoid duplicate entries. In theory, there is no way - 2 people could be born on the same day, with the same name, and live in the same household, and not have a social insurance number available for one or the other. (One might be a foreigner visiting the country).

However, the combination of full name, birthdate, address, and telephone number is usually sufficient to avoid duplication. Note that addresses may be entered differently, people may have multiple phone numbers, and people may choose to omit their middle name or use an initial. It depends on how important it is to avoid duplicate entries, and how large is your userbase (and thus the likelihood of a collision).

Of course, if you can get the SSN/SIN then use that to determine uniqueness.

Larry Watanabe
A: 

I'd recommend a surrogate key. Add all the indexes you need for other candidate keys, but keeping business logic out of the key is my recommendation.

duffymo
A: 

What attributes are available to you? Which ones does your application care about ? For example no two people can be born at exactly the same second at exactly the same place, but you probably don't have access to that data at that level of accuracy! So you need to decide, from the attributes you intend on modeling, which ones are sufficient to provide an acceptable level of data integrity. Whatever you choose, you're right in focusing on the data integrity aspects (preventing insertion of multiple rows for the same person) of your selection.

For Joins/Foreign Keys in other tables, it is best to use a surrogate key.

I've grown to consider the use of the word Primary Key as a misnomer, or at best, confusing. Any key, whether you flag it as Primary Key, Alternate Key, Unique Key, or Unique Index, is still a Key, and requires that every row in the table contain unique values for the attributes in the key. In that sense, all keys are equivilent. What matters more (Most), is whether they are natural keys (dependant on meaningful real- domain model data attributes), or surrogates (Independendant of real data attributes)

Secondly, what also matters is what you use the key for.. Surrogate keys are narrow and simple and never change (No reason to - they don't mean anything) So they are a better choice for joins or for foreign Keys in other dependant tables.

But to ensure data integrity, and prevent insertion of multiple rows for the same domain entity, they are totally useless... For that you need some kind of Natural Key, chosen from the data you have available, and which your application is modeling for some purpose.

The key does not have to be 100% immutable. If (as an example), you use Name and Phone Number and Birthdate, for example, even if a person changes their name, or their phone number, you can simply change the value in the table. As long as no other row already has the new values in their key attributes, you are fine.

Even if the key you select only works in 99.9% of the cases, (say you are unlucky enough to run into two people with the same name and phone number and were coincidentally born the same day), well, at least 99.9% of your data will be guaranteed to be accurate and consistent - and you can for example, just add time to their birthdate to make them unique, or add some other attribute to the key to distinquish them. As long as you don't have to update data values in Foreign Keys throughout your database because of the change, (since you are not using this key as a FK elsewhere) you are not facing any significant issue.

Charles Bretana
A: 

I prefer natural keys, when they can be trusted.

Unless you are running a bank or something like that, there is no reason for your clients and users to provide you with a valid SSN, or even necessarily to have one. Thus, for business reasons, you are forced to distrust SSN in the case you outline. A similar argumant would hold for any given natural key to "persons".

You have no choice but to assign an artificial (Read "surrogate") key. It might as well be an integer. Make sure it's big enough integer so you aren't going to need toexpand it real soon.

Walter Mitty