views:

1370

answers:

10

There seems to only be 2nd class support for composite database keys in Java's JPA (via EmbeddedId or IdClass annotations). And when I read up on composite keys, regardless of language, people keep coming across as them being a bad thing. But I cannot understand why. Are composite keys still acceptable to use these days? If not, why not?

I've found one person who agrees with me: http://weblogs.sqlteam.com/jeffs/archive/2007/08/23/composite_primary_keys.aspx

But another who doesn't: http://weblogs.java.net/blog/bleonard/archive/2006/11/using_composite.html

Is it just me, or are people not able to make the distinction of where a composite key is appropriate or not? I see composite primary keys useful when the table doesn't represent an entity - i.e. when it represents a join table.

A simple example:

Actor { Id, Name, Email } Movie { Id, Name, Year } Character { Id, Name } Role { Actor, Movie, Character }

Here Actor, Movie and Character obviously benefit from having an Id column as the primary key.

But Role is a Many-To-Many join table. I see no point in creating an id just to identify a row in the database. To be it seems obvious that the primary key is { Actor, Movie, Character }. It also seems like a rather limiting feature, especially if the data in the join table changes all the time, you could find yourself with primary key collisions once the primary key sequence wraps around to 0.

So, back to the original question, is it still acceptable practice to use composite primary keys? If not, why not?

A: 

In terms of the domain model, I see nothing wrong with creating a composite primary key when the table doesn't represent an entity - i.e. when it represents a join table (as you mention in your question), other than if it is not montonically increasing, then you will get a certain amount of page splits during insertions.

Some ORM's don't cope well with composite primary keys, so perhaps it is safer to create a surrogate auto-integer for the primary key, and cover the columns with a non-clustered index.

Mitch Wheat
+3  A: 

Similar questions have been asked on SO, and there is no consensus ;)

If you develop a web application, you will love single column pk's, as they make your URLs simpler.

For a sequence to wrap you'd need 2 billion records in a single table (32bit), or 10^18 with 64 bit pk's.

Btw, your data model does not allow for movie characters with unknown actors.

devio
+5  A: 

In my personal opinion you should avoid composite primary keys due to several reasons:

  1. Future changes: when you design a database you sometimes miss what in the future will become important. A significant example for this is thinking a combination of two or more fields is unique (and thus can become a primary key), whereas in the future you want to allow NULLs or other non-unique values in them. Having a single primary key is a good solid solution against such changes.

  2. Uniformity: If every table has a unique numerical ID, and you also maintain some standard as to its name (e.g. "ID" or "tablename_id"), the code and SQL referring to it is clearer (in my opinion).

There are other reasons, but these are just a few.

The main question I would ask is why not use a separate primary key if you have a unique set of fields? What's the cost? An additional integer index? That's not too bad.

Hope that helps.

Roee Adler
A: 

In Ruby for Rails, when not explicitly specifying otherwise, your Role table would be kind of like you described (if the columns are actually the IDs from the other tables). Still, in the database you might want to ensure unique combinations by defining a unique index on those three columns, if only to help the database optimizing your queries. With that unique index in place and the framework not using any other primary key anyway, there is no need for a an additional numeric primary key in your Role table. Having said that, the unique index could could very be defined as a composite primary key instead.

As for future changes: defining a strict database for your first iteration will prevent unexpected data to be persisted, which will make migrations much easier.

So: I would use composite primary keys.

Arjan
A: 

I would only ever use them in join tables. The only way to absolutely ensure that every record identifier is unique and consistent over time is to use a synthetic key.

Composite keys seem OK in theory, which is why they are tempting to use, but practice has shown that they usually indicate that there is a flaw in your data model. Worse still, in many cases they will fail to guarantee uniqueness, given a large enough data set. And data sets always grow over time, so using them may mean that you have planted a bomb in your application which will only explode when the application has been in production use for a while.

I think that people are underplaying ORMs. Every mainstream programming language has a defacto ORM, and has had for years, because they solve the fundamental incompatibility between OO and relational structures. Trying to write any complex, testable OO software against SQL databases without an ORM is very inefficient, at best.

Good ORMs also provide practices and tooling that make it much easier to create and maintain consistent high-quality database schema, so on average, a team will come out well ahead by working with an ORM. Handcrafting schema is rather like writing C++ ...people can do it, but in the real world it is so hard to maintain quality over time that the average product is not good.

Stuart Ellis
Actually, it's arguably so that having to resort to a synthetic primary key means that you have not done enough research to understand the domain you are trying to model. A synthetic primary key says nothing about the entity you are modeling, it's merely a number. A composite key is better since it is tied to the domain.
PatrikAkerstrand
"Composite keys... usually indicate that there is a flaw in your data model" -- I can seem how they could *sometimes* be indicative of a flaw but "usually"? No. Consider that you may be using a using surrogate key in your physical data model to 'hide' a composite keys that exists in your logical data modal.
onedaywhen
@Machine: I would argue that tying keys to the domain is risky, because any bit of data that humans use as an identifier is subject to change of format. Again, your app may work fine for literally years, and then somebody decides to change how payroll numbers work. I had to deal with a situation like this.
Stuart Ellis
@onedaywhen: I can only speak from experience, so YMMV, but mine has been the opposite of what Machine describes - every time somebody says "let's use a composite key" it usually means that there is a complex problem that they don't want to spend time on. I can only think of one example where using a composite still made sense, even after all the what-ifs, but that is now slightly questionable because there's an alternate set of identifiers for some of the items.
Stuart Ellis
I see your content here is mainly about Ruby on Rails. Honestly I'm surprised after reading "every language has a defacto ORM". .Net doesn't. Java doesn't. C/C++ don't. PHP doesn't (although the mysql package is probably most common). But Ruby does... How much experience have you got outside Ruby?
cletus
I meant defacto in the sense of being an established and more-or-less uncontroversial option for doing ORM on that (OO) language. Hibernate, NHibernate, SQLAlchemy. I'll eat crow and admit that I should have thought about PHP and C++ - I do .NET and Ruby.
Stuart Ellis
+1  A: 

I think there's no problem using a composite key.

To me the database it's a component on its own, that should be treated the same way we treat code : for instance we want clean code, that communicates clearly its intent, that does one thing and does it well, that doesn't add any uneeded level of complexity, etc.

Same thing with the db, if the PK is composite, this is the reality, so the model should be kept clean and clear. A composite PK it's clearer than the mix auto-increment + constraint. When you see an ID column that does nothing you need to ask what's the real PK, are there any other hidden things that you should be aware of, etc. A clear PK doesn't leave any doubts.

The db is the base of your app, to me we need the most solid base that we can have. On this base we'll build the app ( web or not ). So I can't see why we should bend the db model to conform to some specific in one development tool/framework/language. The data is directing the application, not the other way around. What if the ORM changes in the future and becomes obsolete and a better solution appears that imposes another model ? We can't play with the db model to fit this or that framework, the model should stay the same, it should not depend on what tool we're using to access the data ...

If the db model change in the future, it should change because functionality changed. If we would know today how this functionality will change, we'll be modeling this already. ANd any future change will be dealt with when the time comes, we can't predict for instance the impact on existing data, so one extra column doesn't guarantee that it will withold any future change ...

We should design for today's functionality, and keep the db model the simplest possible, this way it will be easy to change/evolve in the future.

Billy
+1  A: 

It's a religious thing. I use natural keys and shun surrogates. I have no problem with composite keys either in theory or in practice.

Only the most trivial logical model would involve no composite keys. Call me lazy but I see no need to complicate the data model by introducing surrogates into the physical model on implementation. Sure, I'd consider one on a table if performance issues were found but I take the same approach as for denormalization i.e. as a last resort. Habitually using surrogates amounts to premature optimization, IMO.

onedaywhen
+3  A: 

My general opinion is... no. don't use composite primary keys.

They will typically complicate ORMs if you use them (ORMs sometimes go so far as to call composite primary keys "legacy behaviour") and generally if you're using multiple keys, one or more of them will tend to be natural rather than technical keys, which for me is the bigger problem: IMHO you should certainly favour technical primary keys.

More on this in Database Development Mistakes Made by AppDevelopers.

cletus
+1  A: 

Religious wars have been, and still are, going on on this subject.

OO people have this zealous thing about "identity", and will tell you that the only thing that matters is the ability for you to "identify" "real-life objects" inside your programs, and that composite, "real-life" keys will only get you into trouble when trying to achieve that goal.

Data people have this thing about "uniqueness" that is perceived as "zealous" by the OO side, and will tell you that the only thing that matters is that if the business tells you that the combination of (values for) attribute X and attribute Y must be unique, then it is your job to see to it that the database enforces this business rule of uniqueness of the combined X+Y.

How you want your question answered is just a matter of which religion you prefer. My personal religion is the Data one. That religion has proven to be able to survive any hype and trend ever since 1969.

A: 

I have almost never seen a case where a composite key was a good idea (exception, joining table consisting of only two surrogate keys). In the first palce you are wasting space in the child tables. You are harming performance in the joins as integer joins are generally much faster. If you have the composite key as a clustered index (talking SQL Server here), then you are causing the database to be less efficient about storing records and less efficient in building other indexes - all of which use the clusterd index.

When the data in the key changes (As it almost inevitably will) then you need to update all related tables as well casuing massive unecessary updates and wasting processing power on a task that is completely uneeded when the database is designed to use surrogaste keys. Primary keys need not only to be unique but to be unchanging. Composite keys often fail the second test.

So you are thinking of using a technique that harms performance, causes poor use of memory and database storage, uses way more space in child records (another waste of resources) and requires painful updating of what may be millions of child records when things change. And which might make it hard to use an ORM? Why would you do that? Because you are too lazy to put a surrogate key on and then define a unique index on the potential composite key? Is there any gain at all to using a composite index? For the lack of 5 minutes of work you are permanently harming your database?

HLGEM