ansaurus

Question

Answer 1

+1 A:

Books Online is the best source!

The whole Database Engine - Planning and Architecture - Tables and Index Data Structures Architecture is very good internal introduction.

From this link you can download a local copy of Books Online(it is free). It is the best (and official) reference to all Sql 2008 questions.

Svetlozar Angelov 2009-08-27 14:17:58

Answer 2

+6 A:

Heap storage has nothing to do with these heaps.

Heap just means records themselves are not ordered (i. e. not linked to one another).

When you insert a record, it just gets inserted into the free space the database finds.

Updating a row in a heap based table does not affect other records (though it affects secondary indexes)

If you create a secondary index on a HEAP table, the RID (a kind of a physical pointer to the storage space) is used as a row pointer.

Clustered index means that the records are part of a B-Tree. When you insert a record, the B-Tree needs to be relinked.

Updating a row in a clustered table causes relinking of the B-Tree, i. e. updating internal pointers in other records.

If you create a secondary index on a clustered table, the value of the clustered index key is used as a row pointer.

This means a clustered index should be unique. If a clustered index is not unique, a special hidden column called uniquifier is appended to the index key that makes if unique (and larger in size).

It is also worth noting that creating a secondary index on a column makes the values or the clustered index's key to be the part of the secondayry index's key.

By creating an index on a clustered table, you in fact always get a composite index

CREATE UNIQUE CLUSTERED INDEX CX_mytable_1234 (col1, col2, col3, col4)

CREATE INDEX IX_mytable_5678 (col5, col6, col7, col8)

Index IX_mytable_5678 is in fact an index on the following columns:

col5
col6
col7
col8
col1
col2
col3
col4

This has one more side effect:

A `DESC` condition in a single-column index on a clustered table makes sense in `SQL Server`

This index:

CREATE INDEX IX_mytable ON mytable (col1)

can be used in a query like this:

SELECT  TOP 100 *
FROM    mytable
ORDER BY
       col1, id

, while this one:

CREATE INDEX IX_mytable ON mytable (col1 DESC)

can be used in a query like this:

SELECT  TOP 100 *
FROM    mytable
ORDER BY
       col1, id DESC

Quassnoi 2009-08-27 14:19:19

So, is inserting a row into a heap faster/slower/same as inserting a row into a B-Tree?

PilotBob 2009-11-11 22:07:02

Inserting into a heap is *a little bit* faster than that to a `B-Tree` if no page split occurs on the latter, and *much* faster if the page split occurs (i. e. there is no place to enter the new row and the data should be redistributed between the leaves).

Quassnoi 2009-11-11 22:41:36

Damn you're good, Quassnoi... you know how much garbage exists on the net regarding Clustered indexes...

Stephanie Page 2010-08-09 22:31:24

Answer 3

+2 A:

The Inside MS Sql Server series is pretty good. They have a book on the Storage Engine. The books are written around SQL Server 2005 but it should be pretty much the same.

http://www.amazon.co.uk/Inside-Microsoft-SQL-Server-2005/dp/0735621055

pjp 2009-08-27 14:21:55

Answer 4

+2 A:

Heaps are just tables without a clustering key - without a key that enforces a certain physical order.

I would not really recommend having heaps at any time - except maybe if you use a table temporarily to bulk-load an external file, and then distribute those rows to other tables.

In every other case, I would strongly recommend using a clustering key. SQL Server will use the Primary Key as the clustering key by default - which is a good choice, in most cases. UNLESS you use a GUID (UNIQUEIDENTIFIER) as your primary key, in which case using that as your clustering key is a horrible idea.

See Kimberly Tripp's excellent blog posts GUIDs as Primary and/or the clustering key and The Clustered Index Debate Continues for excellent explanations why you should always have a clustering key, and why a GUID is a horrible clustering key.

My recommendation would be:

in 99% of all cases try to use a INT IDENTITY as your primary key and let SQL Server make that the clustering key as well
exception #1: if you're bulk loading huge data amounts, you might be fine without a primary / clustering key for your temporary table
exception #2: if you must use a GUID as your primary key, then set your clustering key to a different column - preferably a INT IDENTITY - and I would even create a separate INT column just for that purpose, if no other column can be used

Marc

marc_s 2009-08-27 14:34:26

ansaurus

tags:

views:

answers:

SQL Server heap v.s. clustered index

Updating a row in a heap based table does not affect other records (though it affects secondary indexes)

Updating a row in a clustered table causes relinking of the B-Tree, i. e. updating internal pointers in other records.

By creating an index on a clustered table, you in fact always get a composite index

A `DESC` condition in a single-column index on a clustered table makes sense in `SQL Server`

related questions

ansaurus

tags:

views:

answers:

SQL Server heap v.s. clustered index

Updating a row in a heap based table does not affect other records (though it affects secondary indexes)

Updating a row in a clustered table causes relinking of the B-Tree, i. e. updating internal pointers in other records.

By creating an index on a clustered table, you in fact always get a composite index

A DESC condition in a single-column index on a clustered table makes sense in SQL Server

related questions

A `DESC` condition in a single-column index on a clustered table makes sense in `SQL Server`