views:

604

answers:

10

Although I'm guilty of this crime, it seems to me there can't be any good reason for a table to not have an identity field primary key.

Pros: - whether you want to or not, you can now uniquely identify every row in your table which previously you could not do - you can't do sql replication without a primary key on your table

Cons: - an extra 32 bits for each row of your table

Consider for example the case where you need to store user settings in a table in your database. You have a column for the setting name and a column for the setting value. No primary key is necessary, but having an integer identity column and using it as your primary key seems like a best practice for any table you ever create.

Are there other reasons besides size that every table shouldn't just have an integer identity field?

+5  A: 

Every table should have a primary key. It doesn't matter if it's an integer, GUID, or the "setting name" column. The type depends on the requirements of the application. Ideally, if you are going to join the table to another, it would be best to use a GUID or integer as your primary key.

Jesse Weigert
+11  A: 

Sure, an example in a single-database solution is if you have a table of countries, it probably makes more sense to use the ISO 3166-1-alpha-2 country code as the primary key as this is an international standard, and makes queries much more readable (e.g. CountryCode = 'GB' as opposed to CountryCode = 28). A similar argument could be applied to ISO 4217 currency codes.

In a SQL Server database solution using replication, a UNIQUEIDENTIFIER key would make more sense as GUIDs are required for some types of replication (and also make it much easier to avoid key conflicts if there are multiple source databases!).

Greg Beech
You're right, this is a good reason. For me, in this case, I will still use an autoincrementing PK and make a unique constraint on the country code. Also, you can have a non-PK GUID for replication. Just adding some thoughts. :)
BobbyShaftoe
@Bobby - We originally started with an INT PK on our Country table, but it got to be more pain than using the country code so got refactored out. And yep, you're right, the ROWGUIDCOL doesn't have to be the PK but IMHO it normally makes sense to make it so.
Greg Beech
I awe about about to declare that you should always use PKs, but I like your counter examples. These are basically external identifiers that will never be modified, so are valid as their own PKs.
jpeacock
I don't think readability of queries is a particularly good reason personally. While the ISO standard codes are probably as good an example as you will get, it's still more likely that the ISO code for a country will change than a surrogate key will change.
Draemon
+3  A: 

Yes, there are good reasons. You can have semantically meaningful true keys, rather than articificial identity keys. Also, it is not a good idea to have a seperate autoincrementing primary key for a Many-Many table. There are some reasons you might want to choose a GUID.

That being said, I typically use autoincrementing 64bit integers for primary keys.

BobbyShaftoe
+1  A: 

To be a properly normalised table, each row should only have a single identifiable key. Many tables will already have natural keys, such a unique invoice number. I agree, especially with storage being so cheap, there is little overhead in having an autonumber/identity key on all tables, but in this instance which is the real key.

Another area where I personally don't use this approach if for reference data, where typically we have a Description and a Value

Code, Description
'L', 'Live'
'O', 'Old'
'P', 'Pending'

In this situation making code a primary key ensures no duplicates, and is more human readable.

MrTelly
+3  A: 

Every table should have a primary key. But it doesn't need to be a single field identifier. Take for example in a finance system, you may have the primary key on a journal table being the Journal ID and Line No. This will produce a unique combination for each row (and the Journal ID will be a primary key in its own table)

Your primary key needs to be defined on how you are going to link the table to other tables.

Craig T
+2  A: 

I don't think every table needs a primary key. Sometimes you only want to "connect" the contents of two tables - via their primary key.

So you have a table like users and one table like groups (each with primary keys) and you have a third table called users_groups with only two colums (user and group) where users and groups are connected with each other.

For example a row with user = 3 and group = 6 would link the user with primary key 3 to the group with primary key 6.

Koraktor
The primary key in the linking users_groups table would be (User,Group) ensuring that there's no duplicate rows.
Mark Brackett
+1  A: 

One reason not to have primary key defined as identity is having primary key defined as GUIDs or populated with externally generated values.

In general, every table that is semantically meaningful by itself should have primary key and such key should have no semantic meaning. A join table that realizes many-to-many relationship is not meaningful by itself and so it doesn't need such primary key (it already has one via its values).

grigory
You mean, doesn't need an additional surrogate PK column? It should have a primary key constraint, comprising the keys of the other 2 tables.
Tony Andrews
Yes, thank you for clarifying.
grigory
+7  A: 

The most clear example of a table that doesn't need a surrogate key is a many-to-many relation:

CREATE TABLE Authorship (
  author_id INT NOT NULL,
  book_id   INT NOT NULL,
  PRIMARY KEY (author_id, book_id),
  FOREIGN KEY (author_id) REFERENCES Authors (author_id),
  FOREIGN KEY (book_id) REFERENCES Books (book_id)
);

I also prefer a natural key when I design a tagging system:

CREATE TABLE Tags (
  tag VARCHAR(20) PRIMARY KEY
);

CREATE TABLE ArticlesTagged (
  article_id INT NOT NULL,
  tag        VARCHAR(20) NOT NULL,
  PRIMARY KEY (article_id, tag),
  FOREIGN KEY (article_id) REFERENCES Articles (article_id),
  FOREIGN KEY (tag) REFERENCES Tags (tag)
);

This has some advantages over using a surrogate "tag_id" key:

  • You can ensure tags are unique, without adding a superfluous UNIQUE constraint.
  • You prevent two distinct tags from having the exact same spelling.
  • Dependent tables that reference the tag already have the tag text; they don't need to join to Tags to get the text.
Bill Karwin
given this layout, are db's smart enough to only store the text data once?
Michael Pryor
No, typically the string would be stored once in the Tags table and once in the ArticlesTagged table. You wouldn't choose this design if storage was a premium, you'd choose this design for the other advantages I mentioned.
Bill Karwin
Why have the Tags table at all?
Draemon
@Draemon: As a lookup table for permitted tags. You might not necessarily have articles that use every tag, but you still want to list the tags allowed, for instance to populate a menu in your user interface. Plus it's faster to `SELECT Tag FROM Tags` than `SELECT DISTINCT Tag FROM ArticlesTagged`.
Bill Karwin
A: 

A primary key is ALWAYS a good idea. It allows for very fast and easy joining of tables. It aides external tools that can read system tables to make join allowing less skilled people to create their own queries by drag-and-drop. It also makes the implementation of referential integrity a breeze and that is a good idea from the get go.

+1  A: 

The key difference (sorry) between a natural primary key and a surrogate primary key is that the value of the natural key contains information whereas the value of a surrogate key doesn't.

Why is this important? Well a natural primary key is by definition guaranteed to be unique, but its value is not usually guaranteed to stay the same. When it changes, you have to update it in multiple places.

A surrogate key's value has no real meaning and simply serves to identify that row, so it never needs to be changed. It is a feature of the model rather than the domain itself.

So the only place I would say a surrogate key isn't appropriate is in an association table which only contains columns referring to rows in other tables (most many-to-many relations). The only information this table carries is the association between two (or more) rows, and it already consists solely of surrogate key values. In this case I would choose a composite primary key.

If such a table had bag semantics, or carried additional information about the association, I would add a surrogate key.

Draemon