views:

288

answers:

9

I was working on an Access database which loved auto-numbered identifiers. Every table used them except one, which used a key made up of the first name, last name and birthdate of a person. Anyways, people started running into a lot of problems with duplicates, as tables representing relationships could hold the same relationship twice or more. I decided to get around this by implementing composite keys for the relationship tables and I haven't had a problem with duplicates since.

So I was wondering what's the deal with the bad rep of composite keys in the Access world? I guess it's slightly more difficult to write a query, but at least you don't have to put in place tons of checks every time data is entered or even edited in the front end. Are they incredibly super inefficient or something?

+1  A: 

It complicates queries and maintenance. If you are really interested in this subject I'd recommend looking over the number of posts that already cover this. This will give you better info than any one response here.

http://stackoverflow.com/search?q=composite+primary+key

Abe Miessler
Some examples of how it complicates queries and maintenance would probably be relevant.
Tom H.
+2  A: 

If you only use pure self-written SQL to access your data, they are OK.

However, some ORMs, adapters etc. require having a single PK field to identify a record.

Also note that a composite primary key is almost invariably a natural key (there is hardly a point in creating a surrogate composite key, you can as well use a single-field one).

The most common usage of a composite primary key is a many-to-many link table.

When using the natural keys, you should ensure they are inherently unique and immutable, that is an entity is always identified by the same value of the key, once been reflected by the model, and only one entity can be identified by any value.

This it not so in your case.

First, a person can change their name and even the birthdate

Second, I can easily imagine two John Smiths born at the same day.

The former means that if a person changes their name, you will have to update it in each and every table that refers to persons; the latter means that the second John Smith will not be able to make it into your database.

For the case like yours, I would really consider adding a surrogate identifier to your model.

Quassnoi
I recently read a story where two women with the identical first, middle and last names and birthdates were patients at a hospital at the same time.
Tony Toews
Is the requirement for non-Null in the composite fields a Jet/ACE-only restriction, or is it part of SQL theory? That's usually the killer for most natural keys, in my experience, as uniqueness can't be enforced if you have any Nulls anywhere in any of the fields.
David-W-Fenton
+7  A: 

A composite key works fine for a single table, but when you start to create relations between tables it can get a bit much.

Consider two tables Person and Event, and a many-to-many relations between them called Appointment.

If you have a composite key in the Person table made up of the first name, last name and birth date, and a compossite key in the Event table made up of place and name, you will get five fields in the Appointment table to identify the relation.

A condition to bind the relation will be quite long:

select Person,*, Event.*
from Person, Event, Appointment
where
  Person.FirstName = Appointment.PersonFirstName and
  Person.LastName = Appointment.PersonLastName and
  Person.BirthDate = Appointment.PersonBirthDate and
  Event.Place = Appointment.EventPlace and
  Event.Name = Appointment.EventName`.

If you on the other hand have auto-numbered keys for the Person and Event tables, you only need two fields in the Appointment table to identify the relation, and the condition is a lot smaller:

select Person,*, Event.*
from Person, Event, Appointment
where
  Person.Id = Appointment.PersonId and Event.Id = Appointment.EventId
Guffa
Yeah I've realized using composite keys like that is a bit of a hassle. But is there any downside to using a composite key in the Appointment table instead of an autonumber/unique identifier?
Jeff
@Jeff: I usually add an auto-number id to such a table too, even if it isn't used. If you have more information in the `Appointment` table and you can create more than one relation between the same person and event, then you would use a unique identifier, but if the table is only used to create the relation, you can use a composite key (or a constraint and an index).
Guffa
Please stop using that implied join syntax. It is a very very poor practice.
HLGEM
@HLGEM: Yes, you would usually use a `join`, but this worked better to demonstrate the point here. That's how we did joins before the `join` command existed.
Guffa
+2  A: 

If your RDBMS supports them and if you use them correctly (and consistently), unique keys on the composite PK should be sufficient to avoid duplicates. In SQL Server at least, you can also create FKs against a unique key instead of the PK, which can be useful.

The advantage of a single "id" column (or surrogate key) is that it can improve performance by making for a narrower key. Since this key may be carried to indexes on that table (as a pointer back to the physical row from the index row) and other tables as a FK column that can decrease space and improve performance. A lot of it depends on the specific architecture of your RDBMS though. I'm not familiar enough with Access to comment on that unfortunately.

As Quassnoi points out, some ORMs (and other third party applications, ETL solutions, etc.) don't have the capability to handle composite keys. Other than some ORMs though, most recent third party apps worth anything will support composite keys though. ORMs have been a little slower in adopting that in general though.

My personal preference for composite keys is that although a unique index can solve the problem of duplicates, I've yet to see a development shop that actually fully used them. Most developers get lazy about it. They throw on an auto-incrementing ID and move on. Then, six months down the road they pay me a lot of money to fix their duplicate data issues.

Another issue, is that auto-incrementing IDs aren't generally portable. Sure, you can move them around between systems, but since they have no actual basis in the real world it's impossible to determine one given everything else about an entity. This becomes a big deal in ETL.

PKs are a pretty important thing in the data modeling world and they generally deserve more thought then, "add an auto-incrementing ID" if you want your data to be consistent and clean.

Surrogate keys are also useful, but I prefer to use them when I have a known performance issue that I'm trying to deal with. Otherwise it's the classic problem of wasting time trying to solve a problem that you might not even have.

One last note... on cross-reference tables (or joining tables as some call them) it's a little silly (in my opinion) to add a surrogate key unless required by an ORM.

Tom H.
+1  A: 

Composite Keys are not just composite primary keys, but composite foreign keys as well. What do I mean by that? I mean that each table that refers back to the original table needs a column for each column in the composite key.

Here's a simple example, using a generic student/class arrangement.

Person
FirstName
LastName
Address

Class
ClassName
InstructorFirstName
InstructorLastName
InstructorAddress
MeetingTime

StudentClass - a many to many join table
StudentFirstName
StudentLastName
StudentAddress
ClassName
InstructorFirstName
InstructorLastName
InstructorAddress
MeetingTime

You just went from having a 2-column many-to-many table using surrogate keys to having an 8-column many-to-many table using composite keys, because they have 3 and 5 column foreign keys. You can't really get rid of any of these fields, because then the records wouldn't be unique, since both students and instructors can have duplicate names. Heck, if you have two people from the same address with the same name, you're still in serious trouble.

R. Bemrose
Much of this is really an argument against a poorly chosen primary key, not necessarily against composite keys.
Tom H.
@TomH: The post was a bit long, but the point was that composite keys are not just composite primary keys, but composite foreign keys as well with the associated consequence that the other table needs every column that the composite key contains.
R. Bemrose
Finally got around to changing the text for this answer to add the contents of my previous comment to it.
R. Bemrose
A: 

In the first place composite keys are bad for performance in joins. Further they are much worse for updating records as you have to update all the child records as well. Finally very few composite keys are actually really good keys. To be a good key it should be unique and not be subject to change. The example you gave as a composite key you used fails both tests. It is not unique (there are people with the same name born on the same day) and names change frequently causing much unnecessary updating of all the child tables.

As far as table with autogenrated keys casuing duplicates, that is mostly due to several factors:

  • the rest of the data in the table can't be identified in any way as unique
  • a design failure of forgetting to create a unique index on the possible composite key
  • Poor design of the user interface which doesn't attempt to find matching records or which allows data entry when a pull down might be more appropriate.

None of those are the fault of the surrogate key, they just indicate incompetent developers.

HLGEM
+2  A: 

Unfortunately one reason for those negative opinions is probably ignorance. Too many people don't understand the concept of Candidate Keys properly. There are people who seem to think that every table needs only one key, that one key is sufficient for data integrity and that choosing that one key is all that matters.

I have often speculated that it would be a good thing to deprecate and phase out the use of the term "primary key" altogether. Doing that would focus database designers minds on the real issue: that a table should have as many keys as are necessary to ensure the correctness of the data and that some of those keys will probably be composite. Abolishing the primary key concept would do away with all those fatuous debates about what the primary key ought to be or not be.

dportas
True. As far as the relational model is concerned, `PRIMARY KEY` is just a funny way of spelling `UNIQUE`.
dan04
The question is specific to Access, and if using the Jet/ACE as your data store there is a significant limitation, i.e., that you can cluster only on the primary key. It's baked into the engine.
David-W-Fenton
+2  A: 

Most of the answers given here don't seem to me to be given by people who work with Access on a regular basis, so I'll chime in from that perspective (though I'll be repeating what some of the others have said, just with some Access-specific comments).

  1. I use surrogate a key only when there is no single-column candidate key. This means I have tables with surrogate PKs and with single-column natural PKs, but no composite keys (except in joins, where they are the composite of two FKs, surrogate or natural doesn't matter).

  2. Jet/ACE clusters on the PK, and only on the PK. This has potential drawbacks and potential benefits (if you consider a random Autonumber as PK, for instance).

  3. In my experience, the non-Null requirement for a composite PK makes most natural keys impossible without using potentially problematic default values. It likewise wrecks your unique index in Jet/ACE, so in an Access app (before 2010), you end up enforcing uniqueness in your application. Starting with A2010, table-level data macros (which work like triggers) can conceivably be used to move that logic into the database engine.

  4. Composite keys can help you avoid joins, because they repeat data that with surrogate keys you'd have to get from the source table via a join. While joins can be expensive, it's mostly outer joins that are a performance drain, and it's only with non-required FKs that you'd get the full benefit of avoiding outer joins. But that much data repetition has always bothered me a lot, since it seems to go against everything we've ever been taught about normalization!

  5. As I mentioned above, the only composite keys in my apps are in N:N join tables. I would never add a surrogate key to a join table except in the relatively rare case in which the join table is itself a parent to a related tables (e.g., Person/Company N:N record might have related JobTitles, i.e., multiple jobs within the same company). Rather than store the composite key in the child table, you'd store the surrogate key. I'd likely not make the surrogate key the PK, though -- I'd keep the composite PK on the pair of FK values. I would just add an Autonumber with a unique index for joining to the child table(s).

I'll add more as I think of it.

David-W-Fenton
Very informative, thank you very much
Jeff
A: 

I think some coders see the complexity but want to avoid it, and most coders don't even think to look for the complexity at all.

Let's consider a common example of a table that had more than one candidate key: a Payroll table with columns employee_number, salary_amount, start_date and end_date.

The four candidate keys are as follows:

UNIQUE (employee_number, start_date); -- simple constraint 
UNIQUE (employee_number, end_date); -- simple constraint 
UNIQUE (employee_number, start_date, end_date); -- simple constraint 
CHECK (
       NOT EXISTS (
                   SELECT Calendar.day_date
                     FROM Calendar, Payroll AS P1
                    WHERE P1.start_date <= Calendar.day_date
                          AND Calendar.day_date < P1.end_date 
                    GROUP 
                       BY P1.employee_number, Calendar.day_date
                 )
      ); -- sequenced key i.e. no over-lapping periods for the same employee

Only one of those keys are required to be enforced i.e. the sequenced key. However, most coders wouldn't think to add such a key, let alone know how to code it in the first place. In fact, I would wager that most Access coders would add an incrementing autonumber column to the table, make the autonumber column the PRIMARY KEY, fail to add constraints for any of the candidate keys and will have convinced themselves that their table has a key!

onedaywhen
You've restated Sturgeon's Law. Why? How does it improve the discussion of the question? It's actually just a restatement of something that's already been said in other answers/comments, i.e., that just because you use a surrogate key does not release you from responsibility for defining unique indexes on natural keys.
David-W-Fenton
"employee_number" has to be a value provided by some outside authority, otherwise it can't be guaranteed to be usable in the way in which you propose (unless it's a db engine-generated sequence, and in that case, it's the same as a surrogate key, no?). Your structure allows the re-use of an employee_number if an employee leaves and returns. This is not necessarily a desirable structure. It depends on the business rules (whether or not a repeat employee gets assigned a new employee_number or gets a new one). So, I think your example here is quite weak.
David-W-Fenton
I also can't see how you can propose end_date as part of a candidate key, since it will be Null for all current employees, and thus not usable in the candidate key you're proposing.
David-W-Fenton
@David-W-Fenton: "How does it improve the discussion of the question? ...just because you use a surrogate key does not release you from responsibility for defining unique [constraints -- indexes aren't always applicable] on natural keys" -- my point is that most Access coders don't seem to realise that which I why I took the time to point it out.
onedaywhen
@David-W-Fenton: "'employee_number' has to be a value provided by some outside authority" -- you are correct that having a trusted source is a property of a good key. However, the trusted source doesn't have to be external to the enterprise. Usually, the Human Resources department will check references (i.e. find out if the person is who they say they are) then assign then an employee_number using their chosen algorithm, which is then entered into the database. Keys are about trust and most enterprises will trust the authority of their own Human Resources department.
onedaywhen
"...unless [employee_number is] a db engine-generated sequence, and in that case, it's the same as a surrogate key, no?" -- If you are saying that a db engine-generated sequence cannot be a relational key then I agree. As stated above, I propose that employee_number is allocated by the Human Resources department.
onedaywhen
"Your structure allows the re-use of an employee_number if an employee leaves and returns" -- yes and no. Whether to reallocate numbers or not us the decision of the Human Resources department, as you say, according to their business rules. Because a DBMS cannot determine whether one person is the same as another, there is no place for this in the database. So yes it allows it but the implications are irrelevant as regards constraints on the Payroll table.
onedaywhen
@David-W-Fenton: "I also can't see how you can propose end_date as part of a candidate key, since it will be Null for all current employees, and thus not usable in the candidate key you're proposing." -- first, it won't be NULL because I shun NULL values so I am using a far future 'magic' date to model infinity; admittedly, I didn't tell you that ;) The reason for the candidate key is obvious: if periods for the same employee cannot overlap then they must all have distinct end dates. Try drawing on a timeline two periods that have the same end date and you will find they always overlap.
onedaywhen
@David-W-Fenton: "You've restated Sturgeon's Law. Why?" -- ho, ho! I was giving a straight answer to the question, "Why do I read so many negative opinions on using composite keys?" Basically, I'm answering, "Because of ignorance and lack of professional integrity." Sentiments we often hear from your good self, no? ;)
onedaywhen
"which is why I took the time to point it out" -- but it had already been pointed out that one should do it, though without the anti-Access-developer part of the rhetoric. Is it the "all Access developers are idiots" part that you felt was necessary to get into the mix? So far as I can tell, that's the only part of your answer that was new.
David-W-Fenton
I'm glad admit both that you're depending on magic date values and that you recognize it as a mistake.
David-W-Fenton
If you'll look at the tags of the original question, PRIMARY-KEY is included, which makes it pretty clear to me that the question is not about composite keys in general, but about composite keys used as PK. There isn't any dispute over composite keys in general (so far as I'm aware), only over using them as PKs. So, it seems to me you are entirely missing the point.
David-W-Fenton
@David-W-Fenton: "you're depending on magic date values and that you recognize it as a mistake" -- Not so much a 'mistake' as a design compromise. The truth is there is no entirely acceptable solution to the problem and I can cite a good source for this (Snodgrass, Richard T. "Developing Time-Oriented Database Applications in SQL", P120). I could 'design out' the need for a magic date by having a separate table for employees' current salaries but then that would violate the principle of orthogonal design... Choose your compromise. NULLable columns in base tables should be avoided.
onedaywhen
@David-W-Fenton: "pretty clear to me that the question is not about composite keys in general, but about composite keys used as PK" -- You are probably attaching a significance to the word "primary" which I cannot determine but is almost certainly not due (not least because the implications differ between Access and SQL Server). In line with current relational theory, the concept of "primary keys" is best ignored in favour of just "keys".
onedaywhen
Nullable columnns are essential for storing real data, unless you use magic values or pollute your fields with meaningless default values. Null is a perfectly fine default value, but it does mean it can't be part of a unique composite key. Frankly, I don't see why you would ever need anything in your candidate key other than the EmployeeID and start date, unless, of course, your employee gets two paychecks. But using end data doesn't solve the problem except if the end dates are different for the two salaries. You need another field to deal with this, so you don't need a magic value at all.
David-W-Fenton
While in terms of theory you may be right about PK vs. Key, the person asking the question didn't see it that way. Nor is there any dispute among Access developers about composite keys *except* as PK. So, whatever the theoretical considerations, your intepretation makes no sense whatsoever for either the question asked or for reality as it actually exists.
David-W-Fenton
@David-W-Fenton: "I don't see why you would ever need anything in your candidate key other than the EmployeeID and start date, unless, of course, your employee gets two paychecks" -- the rule is they can only receive one salary at any point in time. Consider the rows `{'123456', 50000.00, '2008-01-01', '2009-01-01'}` and `{'123456', 70000.00, '2008-02-01', '2008-11-01'}` satisfy your candidate key but on 2009-06-01 employee 123456 has two salaries. This is the kind of scenario the sequenced key needs to prevent.
onedaywhen
I don't have a candidate key in this fight. I'm saying that your argument is incoherent because your example is illogical.
David-W-Fenton