Where to place a primary key

views:

222

answers:

+3 Q:

Where to place a primary key

To my knowledge SQL Server 2008 will only allow one clustered index per table. For the sake of this question let's say I have a list of user-submitted stories that contains the following columns.

ID (int, primary key)
Title (nvarchar)
Url (nvarchar)
UniqueName (nvarchar) This is the url slug (blah-blah-blah)
CategoryID (int, FK to Category table)

Most of the time stories will never be queried by ID. Most of the queries will be done either by the CategoryID or by the UniqueName.

I'm new to indexing so I assumed that it would be best to place 2 nonclustered indexes on this table. One on UniqueName and one on CategoryID. After doing some reading about indexes it seems like haivng a clustered index on UniqueName would be very beneficial. Considering UniqueName is... unique would it be advantageous to place the primary key on UniuqeName and get rid of the ID field? As for CategoryID I assume a nonclustered index will do just fine.

Thanks.

+3 A:

In the first place you can put the clustered index on unique name, it doesn't have to be onthe id field. If you do little or no joining to this table you could get rid of the id. In any event I would put a unique index on the unique name field (you may find in doing so that it isn't as unique as you thought it would be!).

If you do a lot of joining though, I would keep the id field, it is smaller and more efficient to join on.

Since you say you are new at indexing, I will point out that while primary keys have an index created automatically when they are defined, foreign keys do not. You almost always want to index your foreign key fields.

HLGEM 2009-02-22 18:57:03

I did not know you could split up the PK and clustered index, thanks a lot.

Chad Moran 2009-02-22 19:02:18

the clustered index is the default with the primary key but you don't have to make it so.

HLGEM 2009-02-22 19:27:32

+1 A:

Just out of habit, I always create an Identity field "ID" like you have as the PK. It makes things consistent. If all "master" tables have a field named "ID" that is INT Identity, then it's always obvious what the PK is. Additionally, if I need to make a bridge entity, I'll be storing two (or more) columns of type INT instead of type nvarchar(). So in your example, I would keep ID as the PK and create a unique index on UniqueName.

HardCode 2009-02-22 18:57:19

+1 A:

Data is stored in order of the clustered key; if you are going to key retrievial of data by one of those fields it would be advantageous to use that assuming values aren't significantly fragmented, which can slow down insert performance.

On the other hand, if this table is joined to a lot on the ID, it probably makes more sense to keep the clustered key on the PK.

Joe 2009-02-22 19:03:52

+1 A:

There is no requirement or necessity to have a clustered index at all, primary key or otherwise. It's a performance optimisation tool, like all indexing strategies, and should be applied when an improvement can be gained by using it.

As already mentioned, because the table is physically sorted according to the clustered index key, it's a Highlander situation: there can only be one!

Clustered indexes are mostly useful for situations such as:

you regularly need to retrieve a set of rows whose values for a given column are in a range, so columns that are often the subject of a BETWEEN clause are interesting; or
most of your single-row hits in the table occur in an area that can be described by a subset of the values of a key.

I thought that they were particularly un-useful for situations like when you have high-volume transaction systems with very frequent inserts when a sequential key is the clustered column. You'll get a gang of processes all trying to insert at the same physical location (a "hot-spot"). Turns out, as was commented here before this edit, that I'm sadly out-of-date and showing my age. See this post on the topic by Kimberley Tripp which says it all much better.

Sequential numeric "ID" columns are generally not good candidate columns. Names can be good, dates likewise - if carefully considered.

Mike Woodhouse 2009-02-22 19:23:49

If you believe Kimberly Tripp, and why would any one not believe her :-), the hot-spot phenomenon was relevant before the days of row locking. These days, doing updates in the same spot is actually beneficial because of caching.

Darrel Miller 2009-02-23 03:37:16

Blimey, how out of date am I? I've tried to address that and include a link to rectify it. Thanks!

Mike Woodhouse 2009-02-23 07:09:35

+1 A:

Generally it's always best to index a table on a identity key and use this as the clustered index. There's a simple rule of thumb here

Don't use a meaningful column as primary index

The reason for this is that generally using a PK on a meaningful column tends to give rise to maintenance issues. It's a rule of thumb, so can be overridden such circumstances dictate, but usually it's best to work from the assumed default position of each table indexed by a (clustered) meaningless identity column. Such tends to be more efficient for joins, and as it's usually the default design that most DBAs will adopt so won't raise any eyebrows or give any issues because they system isn't as the next DBA might assume. Meaningless PKs are invariably more flexible and can adapted more easily to changing circumstances then otherwise

When to override the rule? Only if you do envisage performance issues. For most databases with reasonable loads on modern hardware suitably indexed you will not have any trouble if you're not squeezing the last millisecond of performance out of them by clustering the optimal index. DBA and Programmer cycles are much more expensive than CPU cycles and if you'll only shave the odd millisecond or so off your queries by adopting a different strategy then it's just not worth it. However should you be looking at a table with approaching a million rows then that's a different matter. It depends very much on circumstances, but generally if I'm designing a database with tables of less than 100,000 rows I will lean heavily towards designing for flexibility, ease of writing stable queries, and along the principals any other designer would expect to see. Over a million rows then I design for performance. Between 100k and a million it's a matter of judgement.

Cruachan 2009-02-22 20:23:30

If you're going to argue one side of an intense religious discussion, you should at least mention that the other position exists, and is at least somewhat defensible.

le dorfier 2009-02-22 20:57:57

I'm not being religious about it, your dogmatic would argue that you should always use a identity int PK, I really don't feel that's true, but working from the assumption that you will use a identity PK unless you've reason otherwise is a safe position to take...

Cruachan 2009-02-22 21:29:41

The worse position is where your designer hasn't thought about the issue and mixes identity and meaningful keys at random as the mood takes - that's worse than either

Cruachan 2009-02-22 21:31:09

ansaurus

tags:

views:

answers:

Where to place a primary key

related questions