views:

50

answers:

1

My specific concern is related to the performance of a clustered index on a reference table that has many rapid inserts and deletes.

Table 1 "Collection" collection_pk int (among other fields)
Table 2 "Item" item_pk int (among other fields)
Reference Table "Collection_Items" collection_pk int, item_pk int (combined primary key)

Because the primary key is composed of both pks, a clustered index is created and the data physically ordered in the table according to the combined keys.

I have many users creating and deleting collections and adding and removing items to those collections very frequently affecting the "Collection_Items" table, and its clustered index.

QUESTION PART: Since the "Collection_Items" table is so dynamic, wouldn't there be a big performance hit on constantly resorting the table rows because of the clustered index ?

If yes, what should I do to minimize this ?

A: 

Assume you delete and re-insert a set of rows for a given (compound) primary key:

  • If the table index is clustered, you drop the leaf-level data and any "upper-level" index page data, then add data back to the pages, and add the lookup data in the upper level pages. Call it at best four page writing operations.
  • If the table index is non-clustered, you drop the heap data, the "upper-level" index data, and the leaf-level index data, and then write heap data, upper0level index data, and leaf-level index data. At best, that's six page writing operations.
  • In either case you'll have to worry about index building/revising, and with non-clustered you have to manage modifications to the heap table as well as tracking all the index-to-data references.

Performance-wise, it certainly seems like the clustered index is the way to go... though certain operational considerations may trump that. (How many drops/inserts, frequency, table growing at then end [idenity values?] or middle [inserting new PK values], overall size, frequency of updates vs. concurrency/locking issues, etc. etc.)

The only way to avoid this is to not have any index on the heap, and the odds are good that you don't want to do that.

In all cases, you may get high table fragmentation, so (depending on overall table size) periodic index rebuilds might be good.

-- Update based on first comment ------------------

My initial answer was based on the following assumptions:

  • All rows being dropped within a given transaction (i.e. INSERT or DELETE statement) would be for a single Collection. That is, N Items would be added/dropped for a single collection.
  • A single (and preferrably clustered) index would exist on columns (Collection_pk, Item_pk), with Collection_pk the firxt column.

Done this way, whenever you add or drop a set of rows, only that small part (unless it involves hundreds or more rows) of the index/table would need to be modified. My comments were geared towards that design.

Remember, with a clustered index, the "table itself", i.e. the data rows, is the leaf-level of the clustered index--so, again, only that part of the index would ever need to be modified. With a non-clustered index on top of a heap, you still have those extra pages to maintain, and I think frequent delete/inserts would cause some serious table fragmentation.

If a second index existed on (Item_pk, Collection_pk), which would be necessary if you had to perform lookups by item, then it gets tricky. In this case:

  • For the same reasons, it would be more efficient to have a clustered index and a non-clustered index.
  • Absoutely, you will take a performance hit maintaining that second index, as insertion/deletion actions will occur througout the item-first index all the time.

It sounds like you don't have and don't need that second index, so don't worry about it.

Philip Kelley
the collections_item table is already over a million rows and is expected to grow at least ten fold over the next six months. also your comments discuss affects on the index, but im more concerned with overhead in reordering the table itself because of the clustered aspect of the index. am i missing something ? thanks.
Ian
Updated my answer based on Ian's (first) comment
Philip Kelley