views:

1046

answers:

10

Should I always have a primary key in my database tables?

Let's take the SO tagging. You can see the tag in any revision, its likely to be in a tag_rev table with the postID and revision number. Would I need a PK for that?

Also since it is in a rev table and not currently use the tags should be a blob of tagIDs instead of multiple entries of multiple post_id tagid pair?

+6  A: 

A table should have a primary key so that you could identify each row uniquely with it.

Technically, you can have tables without a primary key, but you'll be breaking good database design rules.

Mehrdad Afshari
In a legacy application I maintain there are a few tables that have only one row each, by design. And some tables that will have no more than ten or so rows (lookups of one kind or another). There is utterly no sense of having a primary key in such a situation. At least in my opinion, anyway.
Cyberherbalist
@Cyberherbalist: Even in that case, if you are using that table programmatically, you might need a way to identify each row separately. Also note that having a primary key does not mean having a *separate* primary key *column*. You might have a combination of columns as primary key. BTW, it's perfectly valid to skip patterns and rules when you think they don't make sense in your specific situation but for vast majority of cases, not having a primary key hurts.
Mehrdad Afshari
Exactly. I thought WTF when I read our (UK) Secondary School's "Databases" booklet, for MS Access. Quote - "If Access asks you to define a primary key, click No."
Lucas Jones
@Mehrdad: I agree, generally. In the case I mention, for the small multi-row tables, there is always some column that serves as a de-facto key, even if it isn't defined as a PK. Programmatically accessing these lookups does not seem to impact the operation of the application. For the few single-row tables there is no point of a key of any kind. @person-b: I would have thought WTF as well!
Cyberherbalist
Cyberherbalist: What about cases where the small table is referenced from another table, wouldn't you want a primary key then? (to use as a foreign key).
+4  A: 

See this related question about whether an integer primary key is required. One of the answers uses tagging as an example:

Are there any good reasons to have a database table without an integer primary key

For more discussion of tagging and keys, see this question:

Id for tags in tag systems

Pete TerMaat
However, this is not the question. @acidzombie is not asking about a non-integer PK, but the general case of whether or not to have a primary key of any type (i.e. string).
Cyberherbalist
Thanks, I edited to clarify how the questions relate.
Pete TerMaat
A: 

Databases don't have keys, per se, but their constituent tables might. I assume you mean that, but just in case...

Anyway, tables with a large number of rows should absolutely have primary keys; tables with only a few rows don't need them, necessarily, though they don't hurt. It depends upon the usage and the size of the table. Purists will put primary keys in every table. This is not wrong; and neither is omitting PKs in small tables.

Cyberherbalist
A: 

You should strive to have a primary key in any non-trivial table where you're likely to want to access (or update or delete) individual records by that key. Primary keys can consist of multiple columns, and formally speaking, will be the shortest available superkey; that is, the shortest available group of columns which, together, uniquely identify any row.

I don't know what the Stack Overflow database schema looks like (and from some of the things I've read on Jeff's blog, I don't want to), but in the situation you describe, it's entirely possible there is a primary key across the post identifier, revision number and tag value; certainly, that would be the shortest (and only) superkey available.

With regards to your second point, while it may be reasonable to argue in favour of aggregating values in archive tables, it does go against the principle that each row/column intersection in a table ought to contain one single value. While it may slightly simplify development, there is no reason you can't keep to a normalised table with versioned metadata, even for something as trivial as tags.

Rob
A: 

If there is no PK, how will you update or delete a single row ? It would be impossible ! To be honest I have used a few times tables without PK, for instance to store activity logs, but even in this case it is advisable to have one because the timestamps could not be granular enough. Temporary tables is another example. But according to relational theory the PK is mandatory.

Lluis Martinez
Impossible? Nah. Rows in a small table can be identified uniquely without a PK. If it is a small lookup table (where I see the most use of PK-less tables), there is always one value that could be the key, but at any rate serves as a way to uniquely identify the row even without a PK (or you can edit the row by opening the table using SSMS). But it would seem to be more efficient to use a primay key and a Sql command.
Cyberherbalist
@Lluis, you use DELETE...LIMIT 1 or UPDATE...LIMIT 1.@Cyberherbalist, rows cannot always be identified uniquely since there is nothing stopping all column values being the same, but you can limit the number affected by a query.
Draemon
A: 

it is good to have keys and relationships . Helps a lot. however if your app is good enough to handle the relationships then you could possibly skip the keys ( although i recommend that you have them )

A: 

Since I use Subsonic, I always create a primary key for all of my tables. Many DB Abstraction libraries require a primary key to work.

Note: that doesn't answer the "Grand Unified Theory" tone of your question, but I'm just saying that in practice, sometimes you MUST make a primary key for every table.

Matt Dawdy
A: 

I firmly believe every table should have a way to uniquely identify a record. For 99% of the tables, this is a primary key. For the rest you may get away with a unique index (I'm thinking one column look up type tables here). Any time I have a had to work with a table without a way to uniquely identify records, there has been trouble.

I also believe if you are using surrogate keys as your PK, you should, where at all possible, have a separate unique index on whatever combination of fields make up the natural key. I realize there are all too many times when you don't have a true natural key (names are not unique or what makes something unique might be spread across several parentchild tables), but if you do have one, please please please make sure it has a unique index or is created as the PK.

HLGEM
+3  A: 

I tend to agree that most tables should have a primary key. I can only think of two times where it doesn't make sense to do it.

  1. If you have a table that relates keys to other keys. For example, to relate a user_id to an answer_id, that table wouldn't need a primary key.
  2. A logging table, whose only real purpose is to create an audit trail.

Basically, if you are writing a table that may ever need to be referenced in a foreign key relationship then a primary key is important, and if you can't be positive it won't be, then just add the PK. :)

James Black
On point 1, you should have a composite primary key (user_id,answer_id). On point 2 I would expect a timestamp column, but if it was possible to have two identical rows (identical timestamps and message) I would definitely want to know that something had happened twice and be able to deal with those events separately. "13:15-doubled overdraft" is very different from "533-13:15-doubled overdraft,534-13:15-doubled overdraft" for example.
Draemon
@Draemon - If you have two columns in a table, having a composite key is overkill, IMO, as you really don't get any benefit. If you can have two events happen at the same time then you have bigger problems, if you don't expect two updates at the same time on the same record.
James Black
A: 

If it's a join table then I wouldn't say that you need a primary key. Suppose, for example, that you have tables PERSONS, SICKPEOPLE, and ILLNESSES. The ILLNESSES table has things like flu, cold, etc., each with a primary key. PERSONS has the usual stuff about people, each also with a primary key. The SICKPEOPLE table only has people in it who are sick, and it has two columns, PERSONID and ILLNESSID, foreign keys back to their respective tables, and no primary key. The PERSONS and ILLNESSES tables contain entities and entities get primary keys. The entries in the SICKPEOPLE table aren't entities and don't get primary keys.

lumpynose
I'd probably put a primary key (or at least a unique index) on the SICKPEOPLE table (including both columns) to make sure that you don't accidentally insert the same row twice.
Graeme Perrow
Good point. As you say, you could avoid the problem of duplicate inserts by adding a unique constraint or index on the combination of the PERSONID and ILLNESSID in the SICKPEOPLE table. At the momen I tend to favor not adding primary keys to tables like this because it helps make clear that it's not an entity.
lumpynose