views:

87

answers:

3

Excerpt from: http://www.sqlservercentral.com/articles/Indexing/68563/

The width of the clustering key does not, however, only affect the clustered index. The clustering key, being the rows’ address, is located in every single nonclustered index. Hence a wide clustering key increases the size of all nonclustered indexes, reducing their efficiency as well

Why should the size of the pointer to the datapage in the non-clustered index increase if the sizeof(int *) is the same as the sizeof(char *). Or is there some other addressing mechanism I am not aware of?

+3  A: 

The pointer does not point to the data page: it points to the clustered index. Subtle, but different to where there is no clustered index where each NC index entry points to the Row ID (RID) in the data pages.

So if you have char(10) as your key, each NC index entry has a 10 byte pointer. If you have integer, then it's 4 bytes.

For char vs int, there are other considerations: collation at least (which determines case, accent, kana and width sensitivity).

And you string of 4 bytes only applies fot char(4). nchar(4) is 8 bytes, varchar(4) is 2 to 6 bytes (2 byte length), nvarchar(4) is 2 to 10 bytes.

gbn
Exactly, it does point to the datapage stored in the leaf of the clustered index. And the size of that pointer is the same whether or not the datatype stored is int or char. Note that I am not talkig about sizeof(int) here but sizeof(int *) which is the pointer address value
tsps
No the pointer *is* the length of the clustered key. This is not c++: it's a data structure
gbn
A: 

In a clustered table, the value of the clustering key is a row pointer and hence implicitly appended to each record.

If (col1, col2, col3) is a clustered key, then an index on col4 is in fact an index on (col4, col1, col3, col3).

Its size of course depends on the sizes of col1, col2 and col3.

Quassnoi
No, it isn't. col1, col2, col3 are only contained on the leaf level of the index on col4. Not on any intermediate level of that index. but for all practical purposes... :-)
Frank Kalis
`@Frank Kalis`: you want to say that if we have `1,000,000` pages worth of primary keys and the only value of `col4` is `1`, then we would have one root node and `1,000,000` leaf nodes?
Quassnoi
If 1 would be the only value for col4, I would seriously rethink my indexing strategy...But no, that's not what I am saying. Don't confuse leaf nodes with leaf-rows. Since {col1, col2, col3} are different for each row by definition and they make up the clustered index, there must be as many entries for them in the index on col4 as there are rows in the table. The index on col4 would certainly have only 1 root level, have 1,000,000 leaf-level rows and an index depth > 2.
Frank Kalis
`@Frank Kalis`: note that I didn't write "`1,000,000` records", I wrote "`1,000,000` pages worth of `PRIMARY KEYs`". This means we will have `1,000,000` leaf level pages (not records). So what is stored in the intermediate index levels then?
Quassnoi
Ah, I see now what you mean and I must admit that I successfully confused myself. An entry on a intermediate level of a clustered index contains the key value along with either a pointer to another index row on another intermediate level or a pointer to the data row on the leaf level.For a nonclustered index each index row contains the key value along with the row locator, which is either the clustered index key (if a clustered index is present) or the RID when the table is a heap.
Frank Kalis
`@Frank Kalis`: to put this all together, if we have the branch-level index records, then the upper-level records (root or higher branches) should contain references to these records, *taking the separation values into account*. The primary key values (or the values of the clustered keys along with the server-generated `48-bit` uniquifier if the clustered index is not unique) serve as the separation values too, that's why they are stored as the keys in the intermediate level index records as well.
Quassnoi
Hm, I think we've got it now. Except for the fact, that the Uniqueifier is not 48-bit, but 4 bytes. :-)
Frank Kalis
`@Frank Kalis`: right, it's `32-bit`.
Quassnoi
A: 

To add to gbn's answer, you may want to have a look at these two links:

Overhead in Non-Unique Clustered Indexes

Forwarding and forwarded records, and the back-pointer size

Frank Kalis
Looks like the back pointer issue is addressed here. This will take me sometime to understand. :)
tsps
Be careful: The backpointer is something different again. I just posted this link, because Paul touches the general concepts from "As an aside, when a query uses a nonclustered index to satisfy a query,..." onwards.
Frank Kalis