views:

124

answers:

6

I'm fairly new to database management and this question never seems to be answered in more than one sentence. All other SO answers say "A candidate key is a minimal super key." That means nothing to me.

A candidate key is supposed to specify uniqueness of a db record, correct? And a primary key is a candidate key. If a primary key already specifies uniqueness, what's point of adding more candidate keys?

I have seen example records like the following:

Employee(ID, Name, PhoneNumber)

where ID is the primary key and PhoneNumber is a candidate key. From what I see, the ID is enough to specify the uniqueness of an employee record. Although PhoneNumbers are (probably) unique, specifying them as a candidate key does not seem "minimal" to me.

+4  A: 

It means that if PhoneNumber was indeed a candidate key you could delete the ID column and use PhoneNumber instead. In other words, it is a candidate for being a unique key.

Wikipedia has a more formal definition that you many want to look at.

klausbyskov
+2  A: 

A key is called a candidate key, because while it could be used as a PK, it is not necessarily the PK.

There can be more than one candidate key for a given row, e.g., EmployeeID and SSN.

Often, rather than using a candidate key as the PK, a surrogate key is created instead. This is because decisions around what candidate key to use can be found to be erroneous later, which can cause a huge headache (literally).

Another reason is that a surrogate key can be created using an efficient data type for indexing purposes, which the candidate keys may not have (e.g., a UserImage).

A third reason is that many ORMs work only with a single-column PK, so candidate keys composed of more than one column (composite keys) are ruled out in that case.

Something that many developers do not realize is that selecting a surrogate key over a natural key may be a compromise in terms of data integrity. You may be losing some constraints on your data by selecting a surrogate key, and often a trigger is required to simulate the constraint if a surrogate key is chosen.

RedFilter
Your last para. **Only** if there is data duplication/lack of Normalisation/Update Anomalies. And not otherwise.
PerformanceDBA
And if you are sure that set of columns is really a candidate key, you can throw an UNIQUE constraint at it - independently of the PK. Every decent RDBMS have it...
Fabricio Araujo
A: 

A "candidate key" is a candidate in exactly the same sense that candidates for President are candidates: They are the choices presented to you, from which you must choose one. They are candidates for the primary key. Pick one. In practice, not all choices are necessarily equally good.

Jay
Not really, because there's no fundamental reason why you have to "choose" one key over any other. The idea of having one supposedly "primary" key is essentially an arbitrary choice that doesn't make much difference either in principle or in practice.
dportas
@dportas: Hmm. You can, of course, have multiple keys on a table. But you must choose at most one to be the primary key. At least, on any actual database engine that I've ever used. Besides, I think the word "primary" pretty much implies only one.
Jay
@Jay: It was your presidential analogy I wasn't happy wit. Choosing a president implies that special functions and purposes will apply uniquely to that choice. Choosing a PK is more like choosing a prize bull from among a herd. You still have the same number of bulls and their other attributes are unchanged - so it doesn't really matter if you choose zero, one or more to be "primary"!
dportas
@dportas: Well, okay, after you choose a primary key, the other candidate keys are still keys, just none of them are "primary". After you choose a president, the other candidates tend to disappear into obscurity. (Quick: How many losing presidential candidates from more than 2 elections ago can you name?) In that sense the analogy is imperfect.
Jay
+1  A: 

Candidate key is a concept that appears when you are designing a database system.

Suppose your system will have a table named User, defined as below:

User (fullName, socialSecurityNumber, creditCardNumber, age).

Well, you have to choose which subset of these columns will be your primary key. The goal when designing the database is, of course, to keep this set minimal. You wouldn't use the pair (SSN, creditCardNumber) if the SSN alone already guarantees uniqueness.

Now, suppose that fullName, SSN and creditCardNumber are all fields that you know, somehow, that are unique for all users. You could use any of those as your PK, so they are all candidate keys (whereas age, on the other hand, is not).
Which will you choose? That will depend on factors such as the datatype of the field (it's preferable to set an index on an integer column rather than on an varchar column, for example).

dsetton
A: 

Candidate keys usually refer to those columns which could potentially be selected as the natural primary key. However, natural primary keys are also often a bad idea because they are unique but not unchanging (think of the havoc of changing millions of child rows because a company name changed) or because they are less efficient in joins than surrogate keys. Further, in real life, many potential candidate keys are not nearly stable enough for a true PK and are not as unique as we think. Emails, for instance, can be reused after an account is closed.

You add other indexes to specify uniqueness in order to maintain data integrity. For instance in your example, you are using a surrogate key to ensure uniqueness of a record. But this does not ensure that the person/phone combination is entered only once. So you would want to create a unique index for any possible candidate key (and this can consist of one or more fields) in the data if you are using a surrogate key. This ensures that the items which need to be unique are and allows you to get the performance benefits of a surrogate key.

Further sometimes there are mulitple fields or combinations of fields which should be unique. For instance suppose you have a table storing organization structure. Each organizational entity should be unique, so should each organization/person combination (assuming that there is no job sharing). By placing unique indexes on these fields, you are able to prevent bad data from being inserted into the table.

HLGEM
+1  A: 

Candidate key means exactly the same thing as primary key. All candidate keys serve the same purpose: to ensure data integrity by preventing duplicate data. It's perfectly sensible for a table to have more than one candidate key enforced in it if that's necessary to ensure some data is not duplicated.

Note that minimal in this context does NOT mean smallest. It means irreducible. i.e. no attribute can be removed from the key while still maintaining its uniqueness.

dportas