Can a Superkey include things that aren't part of the Primary key?
A superkey can contain non-unique identifiersor Primary Key. But like a compound primary key, the combination must be unique. A normalized dataset has tables that hold data about a particular entity relevant to the purpose of the database. So, e.g., a company with an employee database has a table of employees and might have another table of employee action history (promotions, reviews, salary adjustments, etc.)
A key is a candidate key if it qualifies to uniquely select a specific row in the table such that the data in that table depend completely on that key and nothing else is required.
Suppose the employee table is USA based. It might have two candidates - one might be an employee number, the other the employee's social security number. If a company requires employees to have SSNs..., this could happen.
OK, two candidates - employee number and SSN.
You must pick one to be the primary key, in which case the other becomes just another data field dependent on the choice of key.
In the USA, based on two factors, employee number would probably be picked as the PK leaving SSN as a secondary field. First, there is a legal restriction on use of an SSN because it is personally-identifying information that promotes identity theft. Second, it is possible that a person might have applied for an SSN but not yet gotten it. So SSN can't be relied upon, thus invalidating it as a candidate.
Therefore, the difference between primary keys and candidate keys is that the other candidates lost the race.
A SUPERKEY is essentially an overdetermined key, one that is guaranteed to be a unique selector - but that isn't the smallest it could be. In my employee table example, the combination of (employee number, SSN) would be a superkey. Note that the superkey will often include the primary key.
There is more to it than that. The definition speaks of a superkey that has the same cardinality as that of the whole record as a key. (The ultimate key, totally impractical since you have to know the whole record beforehand to use it as a key...)
"Cardinality" simply says, when I query a table on a single key value, what is the average number of returned records. For a proper prime key, this is always and only 1 for all values that exist in the table at all. (Obviously, it is zero for numbers that don't exist in the table.) Cardinality will be greater than one for non-unique keys. The usual example for the other end of the spectrum is for a table of size N, using "Gender" as a key will give you a cardinality of N/2. Remember, since it is the AVERAGE return size, the table's distribution doesn't have to be perfect.
Hope that helps.
Logically speaking, yes. If table X
has columns {A, B, C}
and A
is the primary key then {A}
, {A, B}
, {A, C}
and {A, B, C}
are all superkeys because if you have any one of those sets, you know all the values in the row (if it exists.)
However it's not treated as a key in SQL for some purposes, e.g. if table Y
has A
and B
you can not usually define a foreign key Y(A, B) REFERENCES X(A, B)
, because {A, B}
is not a primary key. If you want to be able to declare that foreign key you must add another UNIQUE
constraint on X(A, B)
which is inefficient as it duplicates part of the primary key.
In my opinion this is one of the many flaws of SQL.