views:

244

answers:

3

Hi folks,

I'm going to do an indexed view, based on three tables with inner and outer joins between them (SQL Server 2005). I will run all kind of queries against this view. So, I wonder what is the best way to choose which index to be clustered. What are the criteria or is there any tools to help me around.

(Sorry if my question is dull, I don't have a lot of experience in designing databases).

Thanks in advance!

EDIT: I should make clarification here, that the tables I use in the view are with very intense use and any overhead I take for maintenance of the indexes, should be paid-off.

+4  A: 

Since it's an index, you have to pick a column (or set of columns) which is guaranteed to be non-null and unique in all cases. That's the biggest and most stringent criteria - anything that might be NULL or duplicate is out of the question right from the get-go.

Depending on the type of queries you'll be running on this indexed view, you might also want to see if you have any columns (e.g. a DATE or something) that you'll be running range queries against. That might make an interesting candidate for a clustering key.

But the main thing is: your clustering key must be unique and non-null in any circumstance. And in my personal experience, to reduce index size (and thus increase the number of entries per page), I'd try to use as small a key as possible - a single INT is best, or a combination of two INTs - or possibly a GUID - but don't use VARCHAR(500) fields in your clustering key!

UPDATE: to all those poster who keep telling us clustered indexes don't need to be unique - check out what the "Queen of Indexing", Kimberly Tripp, has to say on the topic:

Let's start with the key things that I look for in a clustering key:

* Unique
* Narrow
* Static

Why Unique?
A clustering key should be unique because a clustering key (when one exists) is used as the lookup key from all non-clustered indexes. Take for example an index in the back of a book - if you need to find the data that an index entry points to - that entry (the index entry) must be unique otherwise, which index entry would be the one you're looking for? So, when you create the clustered index - it must be unique. But, SQL Server doesn't require that your clustering key is created on a unique column. You can create it on any column(s) you'd like. Internally, if the clustering key is not unique then SQL Server will “uniquify” it by adding a 4-byte integer to the data. So if the clustered index is created on something which is not unique then not only is there additional overhead at index creation, there's wasted disk space, additional costs on INSERTs and UPDATEs, and in SQL Server 2000, there's an added cost on a clustereD index rebuild (which because of the poor choice for the clustering key is now more likely).

Source: http://www.sqlskills.com/blogs/kimberly/post/Ever-increasing-clustering-key-the-Clustered-Index-Debateagain!.aspx

marc_s
So, the clustered index should act like a primary key in the tables? By the way, thanks for the revision of "SQL Server" name. I'm still little confused by all the names that Microsoft keep coming up with :)
anthares
SQL Server will use your primary key on a table as its clustering key by default. A view doesn't have a primary key per se - but the clustering key which you can set up should also uniquely identify each row - that's its job.
marc_s
I think both claims in this answer are wrong: a clustered index does not have to be unique, and a clustered index does not occupy any space on disk, so reducing clustered index size does not have any effect
Andomar
+1  A: 

The thumb rule: Select the columns which are you are probably going to use MOST in your queries as WHERE, GROUP etc. Those columns could be a good candidate for non-clustered indexes. Select a column (or a group of column) which would probably make your row unique, and that could be a good candidate for clustered index.

As mentioned by marc, a clustered index imposes a unique constraint, so it definately needed that the column you selct should not have any null and duplicate.

Bhaskar
A: 

A clustered index does not have to be unique. The columns in it can even be nullable. For example, this will run without an error:

create table  #test (col1 int identity, col2 int)
create clustered index ix_test on #test (col2)
insert into #test (col2) values (1)
insert into #test (col2) values (1) -- Duplicate in clustered index
insert into #test (col2) values (null)

A clustered index is part of the table structure on disk. As such, a clustered index uses no additional disk space.

By default, SQL Server clusters on the primary key, which is usually a good choice. You can change that if you have intensive queries with a lot of table lookups. Changing which index is clustered can eliminate table lookups.

Andomar
I checked myself that there is no problem creating clustered index with duplicate keys ... The question is, what is the best practice and how can I take the best of my clustered index, so to speak.I should make clarification here, that the tables I use in the view are with very intense use and any overhead I take for maintance of the indexes, should be paid-off.
anthares
The general advice is: make your primary key the clustered index. To give specific advice, we'd need far more information, at least table layout, relations, and the queries that run
Andomar
@Anthares: **YES** the clustered index must be unique! That's how SQL Server actually finds your data. If you supply a column or set of columns which is not guaranteed to be unique, SQL Server will add a 4-byte uniquifier to your clustering key. Read Kim Tripp's blog - esp. this entry here: http://www.sqlskills.com/BLOGS/KIMBERLY/post/The-Clustered-Index-Debate-Continues.aspx
marc_s
@Andomar: and sure - the clustered index also builds an index tree and thus takes up space! Yes, the leaf level of a clustered index are the actual data pagges - but the intermediary pages of the clustered index are indeed index pages, which will take up space.
marc_s
@Andomar: also, if you have a number of non-clustered indices (not likely on a clustered view, but certainly on a table), the clustering key (all of it) will be added to each and every entry of each and every non-clustered index. If the clustering key is unnecessarily big, you'll be wasting **a lot of space** , not just on disk, but also in SQL Server main memory!
marc_s
@marc_s: I agree that a clustered index uses a small amount of extra disk space, but it's neglegible compared to a nonclustered index
Andomar