ansaurus

Question

Why to add clustered index keys to all (intermediate) nodes on NCI?

Answer 1

+3 A:

You are asking questions re what others have said without any understanding of the subject matter (IT; B-Trees; Index structures) re what they said, made statements about. This is an answer service, not a tutorial service.

"This sounds like clustered key is added to (all) indermediate nodes of non-unique non-clustered index"

No. Quassnoi said nothing of the sort. You cannot take statements (answers within a context; the question) and evaluate them in isolation. The CI key is only applicable to the leaf level, not the "intermediate nodes".

"And by the same logic RIDs are added to indermediate nodes in case of non-clustered table (?)"

Logic ? No again. The determination that the elephants tail is made of thick, long hairs, does not imply that the trunk is made of hair also.

Ask another question re the non-leaf nodes of a non-unique, non-clustered index. I am getting a bit non-plussed about the non-issue.

Answer. For your now consistently evidenced level of understanding, the nonclustered index has the full clustered key value as the entry at the leaf level. Period. End of story. That is no big deal because (a) the number of steps are the same (b) the CI index (not leaf) will be in cache anyway, and thus very fast, without requiring disk access until the last (leaf level).

NCI key lookup, No CI: Index lookup -> RID -> Data row lookup -> Data Row

NCI key lookup, with CI: Index lookup -> CI key -> Clustered index lookup -> Data Row

What is the purpose of it?

Performance. All vendors understand that the slowest component in the chain of functions activated by a query, is the disk, the only component with moving parts. They all do their best to avoid disk access, and improve performance. The Index itself is the most basic structure for avoiding disk access, since the 1960's. The basic B-Tree has not changed since then, it has merely had a million tiny advancements.

Now the problem is, while that is true, each vendor has (a) has their own little special techniques that enhance (add to, without changing the basic operation as described in my posts to you) the operation and (b) in the MicroShifty world, it changes all the time, because the enhancements are, well, not really enhancements. Point being, that excruciatingly low level is not relevant to understanding how indices work; or whether a CI or NCI is good for your particular use; or the advantages/disadvantages of each.

I have already identified, in order to assist you, do not get involved in the lower levels, until you understand the basics, higher levels ... if you do, you will get lost, and it will be an obstacle to your presented intention of learning. As evidenced here. Again.

PerformanceDBA 2010-10-30 07:45:10

I like your answers and upvoted them all. There is no need to continue sending me to study the basics and evaluating the level of understanding (just by mere questions?!) in each your answer. I read all, understood, accepted and already started studying those basics and subject matters. Thanks!

vgv8 2010-10-30 14:56:31

Answer 2

+4 A:

This sounds like clustered key is added to (all) indermediate nodes of non-unique non-clustered index. And by the same logic RIDs are added to indermediate nodes in case of non-clustered table (?)

Yes, this is true.

This is done to improve the maintainability of an index.

Say, you have a secondary (non-clustered) index on column, 1,000,000 records with column = 1 and want to delete one of these records.

The record needs to be deleted from the index as well.

To locate the record to be deleted, a B-Tree search should be performed on the index. But if the branch nodes were not storing the value of the row pointer (be it a clustered key or a RID), the engine would have to scan all 1M records to determine the one to delete.

If the secondary key were UNIQUE, the value of the column would be enough to uniquely locate the node in the index, so storing the row pointer in the branch nodes is not required (and that's why they are not stored).

This discussion may be also interesting to you:

http://www.sqlservercentral.com/Forums/Topic714684-1545-6.aspx

Update:

To check the contents of the branch nodes, you can use DBCC IND:

CREATE TABLE t_clustered (id INT NOT NULL PRIMARY KEY, nval INT, uval INT)
CREATE TABLE t_nonclustered (id INT NOT NULL PRIMARY KEY NONCLUSTERED, nval INT, uval INT)

CREATE INDEX ix_clustered_nval ON t_clustered (nval)
CREATE UNIQUE INDEX ux_clustered_uval ON t_clustered (uval)
CREATE INDEX ix_nonclustered_nval ON t_nonclustered (nval)
CREATE UNIQUE INDEX ux_nonclustered_nval ON t_nonclustered (uval)
;
WITH    q(id) AS
        (
        SELECT  1
        UNION ALL
        SELECT  id + 1
        FROM    q
        WHERE   id < 10000
        )
INSERT
INTO    t_clustered
SELECT  id, (id - 1) / 10 + 1, id
FROM    q
OPTION  (MAXRECURSION 0)
;
WITH    q(id) AS
        (
        SELECT  1
        UNION ALL
        SELECT  id + 1
        FROM    q
        WHERE   id < 10000
        )
INSERT
INTO    t_nonclustered
SELECT  id, (id - 1) / 10 + 1, id
FROM    q
OPTION  (MAXRECURSION 0)

-- Replace mydb with your database name

DBCC IND (mydb, t_clustered, -1)
DBCC IND (mydb, t_nonclustered, -1)

In the output of these commands you should search for records with PageType = 2 (index page) and IndexLevel > 0 (non-leaf node) and find their PageID.

In my case, I got the following PageID: 21074, 21076, 21105, 21107. Note they are site specific: you will have the other values.

Then you should used DBCC PAGE to examine the contents of these pages:

DBCC PAGE (mydb, 1, 21074, 3)
DBCC PAGE (mydb, 1, 21076, 3)
DBCC PAGE (mydb, 1, 21105, 3)
DBCC PAGE (mydb, 1, 21107, 3)

FileId PageId      Row    Level  ChildFileId ChildPageId nval (key)  id (key)    KeyHashValue
FileId PageId      Row    Level  ChildFileId ChildPageId uval (key)  KeyHashValue
FileId PageId      Row    Level  ChildFileId ChildPageId nval (key)  HEAP RID (key)     KeyHashValue
FileId PageId      Row    Level  ChildFileId ChildPageId uval (key)  KeyHashValue

We see that the non-leaf nodes of the non-unique secondary index on nval contain record pointers (id (PRIMARY KEY CLUSTERED) and RID, appropriately), while those of the unique index on uval do not contain record pointers, only the values on the indexed column itself.

This is, again, because with a unique index the value of the column indexed is sufficient to locate its node in the index, while with a non-unique index it's not.

Quassnoi 2010-10-30 09:51:36

ansaurus

tags:

views:

answers:

Why to add clustered index keys to all (intermediate) nodes on NCI?

related questions