views:

89

answers:

2

Hi all, I have a question regarding with this database-design. I am a bit unsure of the difference between identifying and non-identifying relationships in a database leading me to some puzzles in my head.

I have this database design: *(kind of like a movie rental stores. "friend" are those who borrow the movie. "studio" is the production studios that collaborated in making the movie.)

View the images here (Sorry about the links.. (I don't have 10 reputation to post images .. x{ )

which I somewhat understand how it works. However, I was wondering what if I create a loan_id in the loan table, and use movie_id and friend_id as normal foreign keys?

Some of my questions are: What are the advantages or disadvantages of the later approach? A situation where the initial or later model is better? Does the initial model enable a friend to borrow a movie more than once?

Any thorough explanation would be much appreciated.

+2  A: 

The way you have all of your many-to-many tables (tables collaboration, loan, role), is called a composite primary key: Where two (or more) columns form a unique value.

When you have a composite pk, a lot of db designers prefer to create a surrogate primary key (like your proposed loan_id). I'm one of them. This post does a good job going through the arguments of why or why not: http://stackoverflow.com/questions/159087/composite-primary-keys-versus-unique-object-id-field.

My relatively simple reason for it, is composite keys tend to grow: Using the loan example, what happens if that movies loaned more than once? Using the composite approach, you would then have to add loan_date to the composite key.

What if you then wanted to track re-loans of some sort? You would then have to have a 2nd table carrying all the composite pk fields from the loan table (original_loan_movie_id, original_loan_friend_id, original_loan_date) just to refer to the original loan...

Shlomo
So you recommend a single column key for the sake of insulating your referential integrity because of poor business analysis and/or design?
OMG Ponies
You're assuming the requirements won't change, I'm assuming they will. I also wouldn't call it poor analysis, I would state quite obviously that assuming your business requirements won't change is naive. As for other reasons: Inserts for identity PKs (assuming clustered index is on the PK) perform better than natural key PK, though that doesn't apply here. ORM software does better with surrogates.
Shlomo
A: 

In the LOAN table, you'd need to guarantee the following columns are unique:

  • movie_id (replace with copy_id assuming there are multiple copies of a movie)
  • friend_id
  • loan_date

...because I, or anyone else, should be able to rent the same movie more than once. These are also the columns most likely to be searched on...

With that in mind, the idea of defining a column called loan_id as the primary key for the table to be redundant. ORMs have been mandating the use of non-composite primary keys to simplify queries...

But it Makes Queries Easier...

At first glance, it makes deleting or updating a specific loan/etc easier - until you realize that you need to know the applicable id value first. If you have to search for that id value based on a movie, user/friend, and date then you'd have been better off using the criteria directly in the first place.

But Composite keys are Complex...

In this example, a primary key constraint will ensure that the three columns--movie_id, friend_id and loan_date--will be unique and indexed (most DBs these days automatically index primary keys if the clustered index doesn't already exist) using the best index possible for the table.

The lone primary key approach means the loan_id is indexed with the best index for the table (SQL Server & MySQL call them clustered indexes, to Oracle they're all just indexes), and requires an additional composite unique constraint/index. Some databases might require additional indexing beyond the unique constraint... So this makes the data model more involved/complex, and for no benefit:

  • Some databases, like MySQL, put a limit on the amount of space you can use for indexes.
  • the primary key is getting the most ideal index yet the value has no relevance to the data in the table, so making use of the index related to the primary key will be seldom if ever.

Conclusion

I've yet to see a legitimate justification for a single column primary key over a composite primary key.

OMG Ponies
One possible reason is when the parts of the primary key are (or might become) mutable. Then, all referring foreign keys will also need to be updated. In the example here, what if the system were for scheduling loans, and somebody mistyped the date he wants to have the movie?
meriton
@meriton: Every column a database is mutable, assuming you have the privileges necessary. This is why database design/data-modeling is not for iterative development - a constantly changing model means that poor requirements gathering and/or database design/data-modeling. The real question is - do you see the business (in this case, property loaning/renting videos) changing substantially in the near future?
OMG Ponies
The indexing argument is fallacious, at least for Sql Server. The clustered index doesn't have to be the primary key. Also, generally identity (surrogate key) columns are not mutable.
Shlomo
@Shlomo: IDENTITY is only a sequential value; it's the constraint (primary key or unique) that enforces uniqueness in the column. As I said before - if you have enough privilege, you can change the value. It might require disabling a constraint and/or using [IDENTITY_INSERT](http://msdn.microsoft.com/en-us/library/aa259221%28SQL.80%29.aspx), but there's no case where a database contains data that can not be changed.
OMG Ponies
@Shlomo: While SQL Server supports making the clustered index on column(s) other than the primary key, it's still overcomplicated and no benefit vs defining a composite primary key. Additionally, `CASCADE ON DELETE` and `CASCADE ON UPDATE` allow foreign key references without dropping or deferring constraints...
OMG Ponies
@OMG: Cascading updates/deletes perform poorer than a straight update. Identity columns are by default immutable (as they should be). They're supposed to be meaningless (outside of the ability to reference and x-reference), hence an update to them also would be meaningless and unnecessary.
Shlomo
@OMG: and @ your assertion that you have yet to see a reason for surrogate keys: Composite keys will continue to grow. Using the given example of loans, if you ever wanted to have another table referring to the loans table, then you would have to carry over 3 columns as a foreign key. If you wanted to add a column to the loans table, say a copy_id column, you would have to add that to the pk as well, then add that column to any referencing table. It ends up being a nightmare.
Shlomo
@Shlomo: Performance has nothing to with your claim of immutability, nor does being an artificial vs natural key. I provided a link to SQL Server documentation that demonstrates that the IDENTITY can be manipulated. Between that, an UPDATE and ALTER TABLE statements - how do you justify for your (and meriton's) claim of immutability?
OMG Ponies
OMG Ponies