views:

581

answers:

5

How are clustered indexes stored on a hard disk? What is the logical order?

How do non-clustered indexes work?

+7  A: 

This means that the data in the table are stored in a B-Tree according to the order of the CLUSTERED PRIMARY KEY (or the clustering columns).

This name is in my opinion a little bit confusing. The same concept in Oracle is called index-organized table which I find much more descriptive.

Non-clustered indexes contain the value of the indexed columns along with the pointer to the record they are originated from.

The "clustered index" is the table itself; the "non-clustered" index is an ordered copy of some of the table's columns.

If you "create" a clustered index, the table is rearranged. That's why you cannot have more than one "clustered index" on a table: the table cannot be arranged in more than one order.

If you create a secondary index, the shadow copy of the table is created, holding the values of the indexed columns and the pointers to the records they are from. Whenever the table changes, the copy is changed too (the engine takes care of that automatically).

Non-clustered table

id   col1   value
--   --     --
1    1      Data 1
6    1      Data 6
3    1      Data 3
7    2      Data 7
9    2      Data 9
5    2      Data 5

The table is not ordered.

Clustered table

id   col1   value
--   --     --
1    1      Data 1
3    1      Data 3
5    2      Data 5
6    1      Data 6
7    2      Data 7
9    2      Data 9

The table is ordered on id.

Clustered table with a secondary index

Table                      Index
id   col1   value          col1   id
--   --     --             --     --
1    1      Data 1         1      1
3    1      Data 3         1      3
5    2      Data 5         1      6
6    1      Data 6         2      5
7    2      Data 7         2      7
9    2      Data 9         2      9

The table is orderer on id, the index is ordered on (col1, id)

Quassnoi
where this ordered copy stores?
masoud ramezani
@masoud: If the table is "clustered" it is ordered. If it's not, it's not. This is stored in the database in both cases. Again, this name is misleading.
Quassnoi
What makes the language really confusing is the DB2 supports a clustering (often referred to as clustered) index on tables, which is what DB2 uses to try (not guarantee, but try) to keep the data in order, and which DB2 will use to physically reorder rows when reorganizing a table.
Adam Musch
+2  A: 

For non-clustered indexes, a separate file is created, which hold just the index fields, which has it's records placed in the logical index order. For clustered index, there is no separate file -- the data from the table itself (all the fields) is placed in the logical order of the index.

This makes lookup on the index faster (though it's really best of indexes like dates where you'll be looking up a range). It also makes inserts rather slow in the record will be inserted in the middle.

James Curran
I'm not sure I would use the word "file" here. I know you're speaking logically, but physically, that's not necessarily the case, and could be confusing.
Dave Markle
what is the logic of this logical order?
masoud ramezani
"file" is meant theortically. It's just one area of the big database file (although that file is really it's own file system managed by the database server).
James Curran
The "logical order" is the order of the index -- that depends on whatever fields are choosen for the index and what direction.
James Curran
A: 

it means that the table is ordered as specified for the clustered index. Non clustered index are physically stored separately.

sergiom
+1  A: 

It means that a clustered index determined the physical order in which records in a table are actually stored. Non-clustered indices are simply lists of key values stored separately that enable fast lookups in other orderings than the clustered/physical ordering.

Quick example: a table with ID (primary key), FirstName, LastName and Car containing three people: 0=The Stig (Llana), 1=Jeremy Clarkson (DB9), 2=Richard Hammond (911), 3=James May (Lambo) and a clustered index on LastName and a non-clustered index on Car would store the actual lines of data in the table in this physical order on disk:

ID FirstName LastName Car
1  Jeremy    Clarkson DB9
2  Richard   Hammond  911
3  James     May      Lambo
0  The       Stig     Llana

The nonclustered index would also store something like:

Car   ID
911   2
DB9   1
Lambo 3
Llana 0
peSHIr
thanks peSHIr. please give me more detail for analyzing this sample.
masoud ramezani
@masoud ramezani: I have no idea what you mean by "analyzing this sample" or what detail you would need for that...
peSHIr
+1  A: 

Clustered Index Storage

Clustered indexes fundamentally work the exact same way that all other indexes work -- they're stored inside a variant of a struture called a B-Tree. They're stored in the same files, with the same formats as all of your other tables in SQL Server.

The Concept

Step back and think of the data that you're indexing. (I want you to think of a book in this analogy). What if, in addition to having indexes at the end of the book, you also ordered the data inside the book as well? You could look up information much faster. Take, for example, a phone book where all of the data is ordered by last name and first name. You don't have to go to the back of the phone book to find someone's number. Contrast that with a history book, where you have to go to the index at the back of the book to find what you want.

So logically, a clustered index (or "index-organized table" in Oracle) is your data, but sorted. Physically, the leaf nodes of the B-tree contain all of your table's data, in sorted order. This is really helpful when you are scanning the data in your table on a contiguous range, such as a date range.

Another important thing about clustered indexes (in SQL Server at least) is that your clustering columns (that is, the columns that make up how you sort your clustered index) are included at the end of each nonclustered index you define on your table. This makes searching for your clustering columns very fast, and this is often very desirable in OLAP databases.

Nonclustered Indexes

Your table can only be stored in one physical order. But certain times you need to look up data in other ways. For these scenarios, you use a nonclustered index. This is implemented as a B-Tree as well, but it doesn't have any bearing on the order of your table's data, like a clustered index does. That means that if you want data from your table which is not included in your non-clustered index, SQL Server will have to physically look up the data in your table in order to get what you want. This is another operation, and for many queries can be costly, and is a key design consideration when you optimize your tables.

A word

You could write a book on this stuff. Many have. If I haven't bored you to death already, check out Wikipedia's B-Tree page. Start there. If you're still (really) interested, I suggest actually programming a simple B-Tree so you can see what's involved. And, if you want to know even deeper details on how exactly SQL Server stores all of this, check out Kalen Delaney's Inside SQL Server: The Storage Engine. Is all of this learning overkill? That's for you to decide. But the more you study this, the more comfortable you will be with DB development, and the faster your systems will become. I promise.

Dave Markle