views:

851

answers:

3

In a data warehouse, are there disadvantages to creating clustered indexes on fact tables? (most of the time, it will be on the datetime column)

Would you answer yes or no "by default..."?

If I shouldn't create clustered indexes by default, then why? (I know the pros of clustered indexes, but what are some cons?)

References

http://blogs.sqlserver.org.au/blogs/greg_linwood/archive/2006/09/11/365.aspx

+2  A: 

I would always suggest having a clustered index on a table (transaction or warehouse) that is searched by a given value frequently. The downside to a clustered index (or any index) is that you are creating an additional store of data that takes up space. If the table that is being indexed is huge...the index will be too! The more indexes you have the more data you are storing in addition to the database. However, if you need speed for your searches then you may need an index to help gain that speed.

However, you may also look to creating a clustered index on the ID of your table. And then create indexes outside of the database in a product such as Lucene (or Lucene.NET). Then you can search your Lucene index (which has way more flexibility and features when it comes to searching) which would return the ID of a given record (or records) which you can then use to identify the data that you need in your database. This is a route that we have used quite a bit in my current project and I must admit it works quite slick! Creating the indexes is considerably faster (especially when compared to using FullText options in SQL Server). Just something to consider.

Andrew Siemer
Is it really additional storage? Since it is arranging the physical rows according to the indexed columns, I thought it would simply be a rearrangement of the data + some overhead of additional pointers to other pages at that level of the b-tree.
Jeff Meatball Yang
Clustered index is the same storage sorted by the column you build the index on (usually a key...but not always). A non-clustered index is additional storage of the columns being indexed with a pointer to the row(s) that the data references.
Andrew Siemer
Actually a clustered index IS the data in the table, so it's not additional storage. It just puts the data in a tree instead of a heap. In terms of total storage requirement, clustering the table might be smaller or larger than having a heap depending on whether the clustered index key is narrower or wider than the physical address used for referencing rows in a heap. That said, this answer is right - you do want a clustered index for fact tables.
onupdatecascade
+1  A: 

I hate to say always and never, but I think in most cases you will find using clustered indexes on your fact tables a good thing. It's not recommended to create a clustered index on a column type that is inserted on with values that can be random. For example, you would not want a clustered index on something like a GUID. Dates and sequential ID's are great for clustered indexes especially for range queries.

You've stated that you know the pros of the clustered indexes but as far as the cons are concerned will really be determined by how you use them as to whether they are cons for you.

1.) You can have only one clustered index per table, because it changes the layout of the data blocks to match the order of the index. This is why clustered indexes are great for range queries

2.) Because the data blocks are arranged by the order of the index, inserts and updates that change the order of the key can cause physical modifications to the data blocks in order to keep them in order with the index. Inserting the key value in sequential order can alleviate this con.

RC
I was just doing some digging on the Interwebs, and found this article about why clustered indexes are not faster than nonclustered indexes for scanning ranges: http://blogs.sqlserver.org.au/blogs/greg_linwood/archive/2006/09/11/365.aspx
Jeff Meatball Yang
A: 

Having an int (bigint) auto-increment PK-clustered index greatly simplifies partitioning; and sooner, or later a fact table gets to this point. So, even if you think you may not need it now, create one.

Damir Sudarevic