ansaurus

Question

Clustered index on frequently changing reference table of one or more foreign keys

Answer 1

A:

Assume you delete and re-insert a set of rows for a given (compound) primary key:

If the table index is clustered, you drop the leaf-level data and any "upper-level" index page data, then add data back to the pages, and add the lookup data in the upper level pages. Call it at best four page writing operations.
If the table index is non-clustered, you drop the heap data, the "upper-level" index data, and the leaf-level index data, and then write heap data, upper0level index data, and leaf-level index data. At best, that's six page writing operations.
In either case you'll have to worry about index building/revising, and with non-clustered you have to manage modifications to the heap table as well as tracking all the index-to-data references.

Performance-wise, it certainly seems like the clustered index is the way to go... though certain operational considerations may trump that. (How many drops/inserts, frequency, table growing at then end [idenity values?] or middle [inserting new PK values], overall size, frequency of updates vs. concurrency/locking issues, etc. etc.)

The only way to avoid this is to not have any index on the heap, and the odds are good that you don't want to do that.

In all cases, you may get high table fragmentation, so (depending on overall table size) periodic index rebuilds might be good.

-- Update based on first comment ------------------

My initial answer was based on the following assumptions:

All rows being dropped within a given transaction (i.e. INSERT or DELETE statement) would be for a single Collection. That is, N Items would be added/dropped for a single collection.
A single (and preferrably clustered) index would exist on columns (Collection_pk, Item_pk), with Collection_pk the firxt column.

Done this way, whenever you add or drop a set of rows, only that small part (unless it involves hundreds or more rows) of the index/table would need to be modified. My comments were geared towards that design.

Remember, with a clustered index, the "table itself", i.e. the data rows, is the leaf-level of the clustered index--so, again, only that part of the index would ever need to be modified. With a non-clustered index on top of a heap, you still have those extra pages to maintain, and I think frequent delete/inserts would cause some serious table fragmentation.

If a second index existed on (Item_pk, Collection_pk), which would be necessary if you had to perform lookups by item, then it gets tricky. In this case:

For the same reasons, it would be more efficient to have a clustered index and a non-clustered index.
Absoutely, you will take a performance hit maintaining that second index, as insertion/deletion actions will occur througout the item-first index all the time.

It sounds like you don't have and don't need that second index, so don't worry about it.

Philip Kelley 2010-05-24 01:49:23

the collections_item table is already over a million rows and is expected to grow at least ten fold over the next six months. also your comments discuss affects on the index, but im more concerned with overhead in reordering the table itself because of the clustered aspect of the index. am i missing something ? thanks.

Ian 2010-05-24 02:14:18

Updated my answer based on Ian's (first) comment

Philip Kelley 2010-05-24 14:33:08

ansaurus

tags:

views:

answers:

Clustered index on frequently changing reference table of one or more foreign keys

related questions