What do I miss in understanding the clustered index?

views:

answers:

What do I miss in understanding the clustered index?

In absence of any index the table rows are accessed through IAM ((Index Allocation Map).
Can I directly access a row programmatically using IAM?

Does absence of index mean that the only way to read specific row is full table scan reading all table?
Why IAM cannot be engaged for more specific direct access?

"If the table is a heap (in other words, it has no clustered index), the bookmark is a row identifier (RID), which is an actual row locator in the form File#:Page#:Slot#" [1a]

There was no further definition of slot. Well, other sources tell that Slot# is really row number. Correct? or some further juxtaposing with IAM needed to determine specific row?

Now, introduction of clustered index means that no data can be directly accessed but only through eventually clustered index lookup or traversing clustered leaf nodes sequentially.

Do I understand correctly that introduction of clustered indexes is beneficial only for selecting continuous adjacent (ranges of) rows and only through clustered index keys?
Which else benefits are in clustering a table?

Do I understand correctly that clustered index introduction worsen the performance benefits of non-clustered indexes engagement for non-exact match queries? No direct access, sequential access cannot be parallelized, non-clustered indexes are increased by clustered index keys, etc., correct?

Well, I see that clustering a table makes sense for quite specific and well understood contexts while creation of primary keys always default in clustering a table. Why is it?

What do I miss in clustered indexes understanding?

[1]
Inside Microsoft® SQL Server™ 2005: The Storage Engine
By Kalen Delaney - (Solid Quality Learning)
...............................................
Publisher: Microsoft Press
Pub Date: October 11, 2006
Print ISBN-10: 0-7356-2105-5
Print ISBN-13: 978-0-7356-2105-3
Pages: 464

[1a] p.250 Section Index organization from Chapter 7. Index Internals and Management

Here is helpful online copypaste from it
http://sqlserverindexeorgnization.blogspot.com/
though without any credits to source

Related questions:

Update: @PerformanceDBA,

"please, forget the doco you reference and start again"

Starting me again on the basis of what?
Any references, any advices. techniques how to start again?

**"A Clustered Index is always better"

Can you answer my question Why/when/how is whole clustered index scan chosen rather than full table scan? The doubt is what is the meaning of Full Clustered Index Scan. Does not it read more than Full Table Scan?

""If there is an IAM, then there is an Index"

So, there is no IAM if there is no index at all?
There is IAM if there is CI?

How am I supposed to verify/study it?
if all docs write the opposite:
- there is IAM on non-indexed table
- there is no IAM if there is clustered index.

+1 A:

That's a lot of questions. Yes the IAM is used to look up pages on a heap. The difference is that without an index there is no way to know what pages to retrieve for any given piece of data. An important feature of the SQL / relational model of data is that queries access data by data values only - never by using pointers or other structures directly.

A slot number just identifies a row within a page. Row data is not logically ordered within a page, even in a clustered index. Each data page contains a row offset table that points to the position of rows within a page.

A clustered index can slow down data access from nonclustered indexes because of the additional bookmark lookups required. This can be mitigated by using the INCLUDE clause to add columns to a NC index. Sometimes it may be more efficient not to have a clustered index on a table.

dportas 2010-10-29 08:46:46

+1 A:

Please read my answer under "No direct access to data row in clustered table - why?", first.

If there is an IAM, then there is an Index.

But if the is no CI, then the rows are in a Heap, and yes, if you want to read it directly (without using an NCI, or where no Indices exist), you can only table scan a Heap.

A Clustered Index is always better that not having one. There is one exception, and one caveat, both for abnormal or sub-standard conditions:

Non-Unique CI Key. This causes Overflow Pages. Relational tables are required to have unique keys, so this is not a Relational table. The CI can be made unique quite easily by overloading the columns. A Non-unique CI is still better (as per my other post) to have a Non-unique CI than no CI.
Monotonic Key. Typically an IDENTITY column. Instead of random Inserts which insert rows distributed throughout the data storage structure (as is normal with a "good" natural Relational key), the inserted Key is always on the last page. This causes an Insert hotspot, and reduces concurrency. Relational keys are supposed to naturally unique; the surrogate is always an additional index. A surrogate-only is simply not a relational table (it is group of Unnormalised spreadsheets with row identifiers linking them together; you will not get th epower of a databse from that). .
So the standing advice is, use an NCI for monotonic keys, and ensure that the CI allows good data distribution.

The advantages of CIs are vast, there is no good reason to have one (there may be bad reasons as alluded to above).

CIs allow range queries; NCIs do not. But that is not the only reason.

Another caveat is you need to keep the width of the CI Key small, because it is carried in the NCIs. Now normally this is not a problem, as in, wide CI keys are fine. But where you have an Unormalise dbunch of spreadsheets masquerading as a database, which results in many more indices than a Normalised database, that does become a consideration. Therefore the standing advice for Empire devotees is, keep the width of the CI key down. CIs do not "increase" the NCIs, that is not stated accurately. If you have NCIs, well, it is going to have a pointer or a CI key; if you have a CI (with all the benefits) then the cost is, a CI key instead of a RowId, negligible. So the accurate statement is, fat wide CI keys increase the NCIs.

Whoever says sequential access of CIs cannot be parallelised is wrong (MS may break it in one version and fix it in the next, but that is transient).

Using the ANSI SQL ...PRIMARY KEY ... notation defaults to UNIQUE CLUSTERED. because the db is supposed to be Relational. And the Unique PK is supposed to be a nice friendly Relational key, not a idiotic IDENTITY column. Therefore invariably (not counting exceptions) the PRIMARY KEY is the best candidate for clustering.

You can always create whatever indices you want by avoiding the ANSI SQL ...PRIMARY KEY ... notation and using CREATE [UNIQUE] [CLUSTERED] INDEX notation instead.

It is not possible to answer that last question of yours, you will have to keep asking questions until you run out. But please, forget the doco you reference and start again, otherwise we will be here for days discussing the difference between clear knowledge and gobbledegook.

PerformanceDBA 2010-10-29 12:40:38

I did read, see my Update in the main post. I shall keep asking but not in the same thread, it is against the rules here (it is not discussion forum)

vgv8 2010-10-29 13:35:56

+1 A:

Just wanted to plug a large set of blog posts on indexes. Maybe some of them address some of the questions here.

http://michaeljswart.com/?p=905

Michael J Swart 2010-10-29 12:56:59

ansaurus

tags:

views:

answers:

What do I miss in understanding the clustered index?

related questions