views:

226

answers:

4

Let's say we have a Product table, and Order table and a (junction table) ProductOrder.

ProductOrder will have an ProductID and an OrderID.
In most of our systems these tables also have an autonumber column called ID.

What is the best practice for placing the primary key (and therefor clustered key)?

  • Should I keep the primary key of the ID field and create a non-clustered index for the foreign key pair (ProductID and OrderID)

  • Or should I put the primary key of the foreign key pair (ProductID and OrderID) and put a non-clustered index on the ID column (if even necessary)

  • Or ... (smart remark by one of you :))

A: 

This appears to be for a dynamic system where many orders will be added. The clustered index should therefore be on your autonumbered column.

You can make index the primary key and put another unique index on the pair of columns. Or, you can make the pair of columns the primary (but non-clustered) key.

The choice of using the primary key or a unique index key is up to you. But I would make sure that the one that is clustered is for your autonumber column.

Glen Little
I disagree. The autonumber column has no meaning, except "this is the order the records were created". The cluster will order the rows by the cluster key. Autonumber "in order of creation" is often not the order you query the table. Most likely, this table will be used in a join against products and orders using the ProductID and OrderID. I'd cluster for ordering so the cluster key is in the same order as the driving table of the most common query, otherwise you are wasting the cluster index.
mrjoltcola
I guess the question comes down to how busy is this table? If there are many inserts, then you could get to a state where the table has to be restructured on every insert. This would slow the system down, especially with a big table.With the clustered index on the autonumber, this does not happen. The other index could have padding set to give extra room to avoid restructuring.
Glen Little
+2  A: 

I know these words might make you cringe, but "it depends."

1) First answer why you are using the surrogate key ID in the first place; that might help add context to the problem. I assume you are using the surrogate key because you are using some framework that works well with single column keys. If there is no specific design reason, then for a join table, I'd simplify the problem and just remove the autonumber ID, because it brings no other benefit. The primary key becomes the (ProductID, OrderID). If not, you need to at least make sure your index on the (ProductID, OrderID) tuple is unique to preserve data integrity.

2) Clustered indexes are good for sequential scans/joins when the query needs the results in the same order that the index is ordered. So, look at your access patterns, figure out by which key(s) you will be doing multi-row selects / scans, and by why key you'll be doing individual record access, and create the clustered index on the key you'll scan most, and the non-clustered key on the id you'll randomly access. You have to choose one or the other, since you cannot cluster both.

mrjoltcola
A: 

My preference has always been to create an autonumber for Primary Keys. Then I create a unique index on the two Foreign keys so that they are not duplicated.

The reason I do this is because the more I normalize my data, the more keys I have to use in joins. I have ended up with designs going six to seven levels deep, and if I use keys flowing from one level to another, I could potentially end up with a n^2 keys in the join.

Try convincing my SQL Developers to use all of that for a single query, and they will really like me.

I keep it simple.

Raj More
Why add a 3rd column when the linked tables already use autonumber columns? Pointless
gbn
Don't let developers dictate the design just because they are too lazy to write joins or views. Linking tables have no business having an autonumber column, as already noted, because they are an implementation detail that shouldn't be obvious. You shouldn't be querying or exposing the linking table by itself, it will always be part of a join. Consider writing a view, let the developers use the view, and the linking table will never be evident.
mrjoltcola
@gbn and @mrjoltcola: Most Join tables have additional attributes. An OrderItem table may to have the ItemPrice so changes to the lookup Items table do not change up order price from Quoting to Shipment. So I make an OrderItemId for this table. When this is joined down to the ShippingLocationItems table, I have a 2 key join instead of a 1 key join. When I track back with the DeliverConfirmatioForItems, I have again, a 1 column foreign key instead of 8 or more cols for the shipping loc details, the delivery and signature mechanisms. Do this for all join tables and keep things uniform.
Raj More
A: 

The correct and only answer is:

  • Primary key is ('orderid' , 'productid')
  • Another index on ('productid' , 'orderid')
  • Either can be clustered, but PK is by default

Because:

  • You don't need an index on orderid or productid alone: the optimiser will use one of the indexes
  • You'll most likely use the table "both" ways
  • You don't need a surrogate key because you already have them on the linked tables. So a 3rd columns wastes space.
gbn
I did read that it's considered bad practice to add columns from the clustered index to the non-clustered indexes: The clustered index is always added to the non-clustered index. Therefor the index 'productid' + 'orderid' wouldn't make sense?
Zyphrax
@Zyphrax: the column order makes it a completely different index
gbn