views:

52

answers:

2

Suppose I need to update myTab from luTab as follows

update myTab
  set LookupVale = (select LookupValue from luTab B
                                       where B.idLookup = myTab.idLookup)

luTab consists of 2 columns (idLookup(unique), LookupValue)

Which is preferable : a unique clustered index on idLookup, or one on idLookup and Lookupvalue combined? Is a covering index going to make any difference in this situation?

(I'm mostly interested in SQL server)


Epilogue :

I followed up Krips tests below with 27M rows in myTab, 1.5M rows in luTab. The crucial part seems to be the uniqueness of the index. If the index is specified as unique, the update uses a hash table. If it is not specified as unique, then the update first aggreates luTab by idLookup (the Stream Aggegate) and then uses a nested loop. This is much slower. When I use the extended index, SQL is now no longer assued that that LookupValue is unique so its forced down the much slower, stream aggregate-nested loop route

+2  A: 

Firstly:

  • A covering index is always non-clustered
  • You should always have a PK and a clustered index (there are the same by default on SQL Server)

The 2 concepts are separate

So:

  • Your PK (clustered) would be idLookup if this uniquely identifies a row
  • The covering index would be (idLookup) INCLUDE (LookupValue)

However:

  • idLookup is the PK (clustered), so you don't need a covering index
  • the clustered index (PK) is implicitly "covering" by the nature of a clustered index (simply, index is data at the lowest level)
gbn
+1 good points!
marc_s
+1  A: 

I've created your tables and loaded just a few records (50 or so lookup, and 15 in myTab).

Then I've tried various index options. The Index Seek on luTab always has a cost of 29%.

The interesting bit is that if you add in the LookupValue column to the index on luTab the execution plan shows two extra steps after the Index Seek: Stream Aggregate and Assert. While cost is 0%, that may go up with more data.

I've also tried a nonclustered index on just idLookup, and including LookupValue as an 'Included Column'. That way the data pages don't need to be accessed to retrieve that that column. That may be an option for you although the execution plan doesn't show anything different (but they don't have the Stream Aggregate / Assert either).

-Krip

Krip
I'm working on SQL 2000 so I don't have the 'include' option, but it sounds like its targeted at this issue. I should probably do what you've done : look at the query plan.
cindi