views:

172

answers:

12

Is there a benefit to having a single column primary key vs a composite primary key?

I have a table that consists of two id columns which together make up the primary key.

Are there any disadvantages to this? Is there a compelling reason for me to throw in a third column that would be unique on it's own?

+2  A: 

I have had to use multi-column primary keys in the past, and it became quite a nightmare very quickly.

If you have one table that references your first table, how does it contain that primary key? Now add another table that references only the second table but needs to find data in the first. Now another... on down the rabbit hole.

If you know that you will only have the one table, there's probably not an issue either way- use whichever represents your data better. But if you'll be using it in joins, you can lose performance pretty quickly.

AllenG
A: 

In most databases I know (MySQL, PostgreSQL) the composite key will generate an index. So if you specify your key as composite the DB should provide you an efficient way to lookup tuples from the DB using that key. I think it is the case for all DBs. I think you do not have to bother about performance there.

jdehaan
+8  A: 

Database Normalization nuts will tell you one thing.

I'm just going to offer my own opinion of what i've learned over the years..I stick an AutoIncrementing ID field to Every ($&(@#$)# one of my tables. It makes life a million times easier in the long run to be able to single out with impunity a single row.

This is from a "down in the trenches" developer.

Caladain
+1 for the emphasis. I couldn't agree more.
David Stratton
That way you can ignore all the duplicate garbage rows? - auto-inc is a virus. Once one application layer expects it becomes a nightmare to support proper composite keys. Let me guess though, you probably use unique keys. Although you do not call them your primary key that it what is keeping your table sane. -- D.N. nut
nate c
-1 for the wrong.
Recurse
-1 for referring to anyone who disagrees with you as a "nut"
Tom H.
People are free to disagree with me. But just like i'll call out architecture astronauts, i'll call out someone who harps non-stop on DN or any other single technique as the solution to any problem. Is sticking an auto-incrementing field on every table a "virus"? No. It's solved more headaches than it's created *in my experience*. Your mileage may vary. But i can't tell you the number of times i've had a "perfect database" design completely fail to allow me (or the guy who designed it) pick out a single row to do a single update based upon his magical multi-composit key.
Caladain
I'll also call anyone who's first response to a windows problem is to cook up a Com client/server a nut as well. KISS. But hey..99.99999% of the time, the auto-incrementing field just sits there and does nothing. And it's one of the few things i'll carry on my back for that *one* time when the balls are to the wall. Just like i keep regular expressions in my back pocket for the few times they *solve the problem* elegantly.
Caladain
@Caladain: How could a composite key fail to allow you to pick out a single row? A key (any key) does exactly that - identify a single row. I'd say that if you can't do that then either your software is broken or you are doing something badly wrong.
dportas
Wasn't me who designed it. Problem that's hit me twice is that the database designer doesn't do a good job. 4 months into production, suddenly keys start applying to multiple rows and other such nastiness. With millions of entries. Last time it happened a week in..much better than the first time. Lots of similar looking data, etc. Thus why i offered my original post as my opinion only. It's not "proper"..but since i've demanded it, it's caused less issues than it's created. The DB guys complain, but the incident reports support Software's point of view. ~shrug~ Your mileage may vary.
Caladain
It basically allows me to quickly whip up a python script and have a guaranteed unique key per row per table. We don't use the auto-incrementing field for any other purpose other than when the fecal matter hits the oscillating rotational device. As i said..my opinion only, not a DB guy by trade (plain jane SQL for me..i'm no where as good at creating these 12 join queries as some people are (exaggerating a bit) but i could do a swinging job if need be), just software (web, embedded, windows GUI, Linux QT, etc)
Caladain
So the database design didn't enforce those "keys". That is bad design. But it has nothing to do with composite keys - it just means someone left out the relevant constraints. So you are blaming the wrong culprit and you are not correct to blame "normalization nuts" for your problem which had absolutely nothing to do with normalization.
dportas
of course it has nothing to do with "normalization" at all. The Normalization nuts are the people (they exist in different names in software and systems as well) who prattle on and on about needing to completely redesign the entire database anytime a new field is added to normalize it in meetings. I was poking fun, while mixing a bit of experience, AND coating the whole thing from the outset with "This is just my opinion". Nothing more. The whole tone of my OP should convey that as well :-P
Caladain
@Caladain Actually the whole tone of your response was that of an object-oriented software developer who thinks he can answer a database question just because he happens to use SQL to interact with his object-store. As the comment above said, if you get multiple rows back it is because either, the DB design is broken; your code is broken; or just as likely, your understanding of the domain is broken. Either way, introducing a surrogate key to avoid the need to think about uniqueness, equivalence, and identity, is a dodgy kludge.
Recurse
A: 

Don't use multi-column keys. They get very difficult to maintain, especially if the components of the key are not human-understandable.

Use an internally generated key instead.

Matthew Jones
This is poor advice without explaining the use case. An internally generated key won't prevent duplicate data that the multi-column key would do. So if the integrity of those unique columns is important to maintain then an internally generated key simply will not help you. The two achieve quite different things.
dportas
+2  A: 

Is there a benefit to having a single column primary key vs a composit[sic] primary key?

Yes. If the primary key also happens to be the clustered index, it is common that the clustered index is duplicated fully for each secondary index in the table. Therefore, having a fatter clustered index, which is what one would get with a composite, implies an increase in storage cost. Also, foreign references to this table would need to specify both fields to refer to a unique entry, which implies a further storage cost. There is also an arguably greater cost in development time because there is a slight increase in the complexity of the join.

On the other hand, depending on the distribution of the values of your two key fields, it may be the case that concurrent access to your table is greatly improved because chronologically-successive inserts could occur on different physical pages; this could be the case, for example, if your fields are time-independent (and non-monotonic like an auto-incrementer) like clientID, or something like that. This could be significant for performance in a high concurrency environment.

I have a table that consists of two id columns which together make up the primary key.

Are there any disadvantages to this? Is there a compelling reason for me to throw in a third column that would be unique on it's own?

If the most common way in which your table is queried is to specify those three fields as restrictions, then having all three in a composite key would likely be the fastest lookup.

And there is another important point that I almost forgot. Since having a composite key means that foreign references to this table from other tables must specify all fields in the key, it also means that some queries performed on the other table that required a restriction on one or more of the parts of the composite index of this table, can be performed without requiring a join. This could be considered similar to the concept of denormalization for the sake of performance (and arguably sacrificing a little ease of maintainability).

cjrh
Normalization has nothing to do with whether a key is composite or not. Composite keys therefore have nothing to do with denormalization. Apart from that I was with you until the last sentence :)
dportas
@dportas: it can be argued that foreign key references that duplicate multiple fields (the composite key) in order to refer to a single record is "a bit like" denormalizing for the sake of performance, by avoiding the join under certain circumstances. I'm being a bit loose with a simile there, not trying to explain normal form theory.
cjrh
@dportas: you're right, I was too strong in the original; post edited to reflect the metaphor more clearly.
cjrh
+2  A: 

Single column keys are simple to write, simple to maintain, and simple to understand.

If you're going to have a huge number of rows - billions? - maybe saving a byte here and there will help.

But if you're not looking at extreme cases, optimizing for "simple" is often the best way to go.

Dean J
A: 

Imagine you have a composite primary key (field1 and field2 for example) instead of just one autoincremental identifier. Clients' requirements are very changeable and after some development the client says that field2 is not compulsory and it can be nullable, it won't be possible to continue as the primary key of the table. Imagine this table is one of the most importants in your model. Then all the foreign keys should be changed if field 2 cannot be in the composite primary key. It's a nightmare changing the primary key all over the model.

As well if there is a lot of foreign keys I think is not a very good Idea to add several keys to each table just to make the link.

Javi
A: 

I'm not sure there's enough information for us to make your call for you. Here are a few observations that might be helpful though.

is the primary key a clustered index? Is the table referenced by other tables through a foreign key? If yes, then you may benefit from a single-column key, because that key will appear in those other tables. This is how you would save space.

If the table is not referenced by other tables, then you would be using extra space in your table without much additional benefit. And, if this table only contains the two columns now, then you would increase the table size by 50%.

If you use an extra column for the primary key, do not forget your natural key (the two-column key). Create a unique constraint on the composite key. You still want to maintain the integrity of the real data.

bobs
A: 

The decision should always be based on requirements and the intended meaning of the data. A table with only a single attribute key clearly enforces a different kind of constraint and implies that your table has a very different meaning to the same table with a multi attribute key. On the other hand adding an additional unique column would also be a waste of resources and add meaningless complexity if you don't actually need to use it anywhere.

dportas
+2  A: 

If you are a coder and the database is nothing to you but a glorified object-store, then sure, by all means inject surrogate keys willy nilly. In fact go one better and just delegate all DB schema design and DB interaction to your favourite ORM and be done with it. Indeed, when I want a small or medium scale object-store, that's exactly what I do.

If you are approaching an information systems or information management problem, then it is a completely different story. When you start dealing with 10's (or more likely 100's) of millions of dirty records integrated from multiple sources, several or all of which are not under your control; at that point the seductive lure of an easy answer to the problems of 'identity' is a trap.

Yes you sometimes still introduce a surrogate key internally to allow for concise FK relationships and improved cache efficiency on covering indices; but, you gain those benefits at the cost of substantial pain at managing the natural-key/surrogate-key relationship.

In this case it will be important to make sure you don't allow the surrogate key to leak. Your public API's at the business-logic layer should use the natural-key, nothing above an document/record-cache should be aware of the existence of a surrogate key. Be aware that the cost of matching updates against the existing surrogate keys can be prohibitive, and a far larger scalability hit than the incremental cost of moving a few extra bytes per request over the internal network.

So in conclusion:

  1. If the DB is just being used as an object-store: let the ORM worry about object identity, and there should almost certainly be a surrogate key.

  2. If the DB is being used as a database: the introduction of a surrogate key is an engineering design decision with serious tradeoffs in both directions. The decision will need to be made on a case by case basis, with full recognition of the resulting costs to be accepted in exchange for the benefits gained either way.

Update

The 'convenience' of a surrogate key is really just the ability to punt on the question of identity. This is often necessary in a database, and reasonable in the caching layer as I allow, but beyond that it leads to brittle data designs. The problem is that identity is no something that has one correct answer. For non-trivial data-intensive systems you will routinely find yourself needing to work in terms of equivalence classes, rather than the reference identity, object-oriented programming lulls us into thinking is normal.

What it really comes down to is a realization that the whole concept of a 'primary key' is a fiction invented to help the relational model work efficiently; but, adopting a surrogate key, cements that fiction and makes the whole system brittle and inflexible. Business logic needs to be able to provide their own definitions of equality — sometimes four copies of the same file need to be considered four files, sometimes they should be considered indistinguishable from the original file; when you edit one of them, is that then a new file? the same file? The answer to both questions is of course yes, when... Working with natural keys provides this critical ability to work in terms of conceptual equivalence classes. If you let surrogate keys infect your business logic, you quickly lose this.

Recurse
+1, though I'm not sure that in practice it's really always absolutely necessary to prevent the business layer from knowing the surrogate key, as long as it's understood that that key is just an abstract representation of the data and not the real thing. You lose a lot of the convenience of surrogate keys if you don't allow your business objects to use them. Sometimes the real data just isn't that easy to pass around.
John M Gant
A: 

In general I prefer to have a surrogate key becasue there are very few truly good natural keys (key problem is not uniqueness but that they change over time) and the longer the natural key, the more it affects performance when used as a PK. If you have a natural key, you should create a unique index on it and then use the surrogate key as the PK used for joining to other tables. That enforces the uniqueness of the natural key data but fixes the problems of join performance and the extra time to update all child records when the natural key changes.

There is one case where I ignore this and that is a joining table. If it is a table that is used to enforce a many to many relationship and consists only of two surrogate keys from other tables, then you really gain nothing from adding a surrogate key. Typically the individual keys are used for joins not the PK and surrogate keys almost never change. In a joining table, I just add the two colmns I need and nothing else.

HLGEM
A: 

One caveat to the auto-incrementing column is that it can give a false impression of uniqueness. Sure, your identity column is always unique, but that's just a meaningless value you've attached to the table. Unless you also have a unique constraint attached to the set of columns that represent the actual semantic primary key of the table, you have no guarantee of meaningful uniqueness.

John M Gant