views:

476

answers:

7

Say you have a Many-Many table between Artists and Fans. When it comes to designing the table, do you design the table like such:

ArtistFans
    ArtistFanID (PK)
    ArtistID (FK)
    UserID (FK)

 (ArtistID and UserID will then be contrained with a Unique Constraint 
  to prevent duplicate data)

Or do you build use a compound PK for the two relevant fields:

ArtistFans
    ArtistID (PK)
    UserID (PK)

(The need for the separate unique constraint is removed because of the 
 compound PK)


Are there are any advantages (maybe indexing?) for using the former schema?

+2  A: 

I cannot think of any reason to use the first form you list. The compound primary key is fine, and having a separate, artificial primary key (along with the unique contraint you need on the foreign keys) will just take more time to compute and space to store.

Brian Campbell
+1  A: 

The standard way is to use the composite primary key. Adding in a separate autoincrement key is just creating a substitute that is already there using what you have. Proper database normalization patterns would look down on using the autoincrement.

TheTXI
+7  A: 
ArtistFans
    ArtistID (PK)
    UserID (PK)

The use of an auto incremental PK has no advantages here, even if the parent tables have them.

I'd also create an index (UserID, ArtistID) too

Autonumber/ID columns have their place. You'd choose them to improve certain things after the normalisation process based on the physical platform. (Don't quote Celko back to me please)

gbn
+1 for the extra call for an index on the composite key
TheTXI
A primary key is just a special type of index. An index that duplicates the primary key only adds overhead.
Andomar
The order is important: this is a different index
gbn
Ah ok, makes sense then :)
Andomar
It's important when drilling from user to artist to have the reverse (aka all artists for a user). With the reverse index, it allows an index seek rather that a scan of the PK. The PK is enough for a "number of users per artist" query: but not the other way around.
gbn
+2  A: 

Even if you create an identity column, it doesn't have to be the primary key.

ArtistFans
    ArtistFanId
    ArtistId (PK)
    UserId (PK)

Identity columns can be useful to relate this relation to other relations. For example, if there was a creator table which specified the person who created the artist-user relation, it could have a foreign key on ArtistFanId, instead of the composite ArtistId+UserId primary key.

Also, identity columns are required (or greatly improve the operation of) certain ORM packages.

Andomar
Which means then to find creator-artist details you *always* have to include ArtistFans table in the JOIN = slower.
gbn
In this case, it seems unlikely that you're interested in the creator of the artist-user relation, but not in the user.
Andomar
A: 

Funny how all answers favor variant 2, so I have to dissent and argue for variant 1 ;)

To answer the question in the title: no, you don't need it. But...

Having an auto-incremental or identity column in every table simplifies your data model so that you know that each of your tables always has a single PK column.

As a consequence, every relation (foreign key) from one table to another always consists of a single column for each table.

Further, if you happen to write some application framework for forms, lists, reports, logging etc you only have to deal with tables with a single PK column, which simplifies the complexity of your framework.

Also, an additional id PK column does not cost you very much in terms of disk space (except for billion-record-plus tables).

Of course, I need to mention one downside: in a grandparent-parent-child relation, child will lose its grandparent information and require a JOIN to retrieve it.

devio
...and a lot of unique indexes (unique constraints are indexes too on disk) to ensure data integrity of your natural key...
gbn
I don't agree that adding an extra column makes things simpler. And using an auto increment surrogate key in place of the natural PK makes data integrity weaker, unless you add in the extra constraint, as devio commented. If you did your analysis well, entity tables will have a simple PK, while relationship tables will have a compound PK. What could be simpler?
Walter Mitty
+1  A: 

Assuming that you're already a devotee of the surrogate key (you're in good company), there's a case to be made for going all the way.

A key point that is sometimes forgotten is that relationships themselves can have properties. Often it's not enough to state that two things are related; you might have to describe the nature of that relationship. In other words, there's nothing special about a relationship table that says it can only have two columns.

If there's nothing special about these tables, why not treat it like every other table and use a surrogate key? If you do end up having to add properties to the table, you'll thank your lucky presentation layers that you don't have to pass around a compound key just to modify those properties.

I wouldn't even call this a rule of thumb, more of a something-to-consider. In my experience, some slim majority of relationships end up carrying around additional data, essentially becoming entities in themselves, worthy of a surrogate key.

The rub is that adding these keys after the fact can be a pain. Whether the cost of the additional column and index is worth the value of preempting this headache, that really depends on the project.

As for me, once bitten, twice shy – I go for the surrogate key out of the gate.

WCWedin
A: 

I liked WCWedin's answer (I'm a newb so I can't vote it up yet). I'm looking at creating database relationships that have start and end date properties.

Nickro