ansaurus

Question

Is this a good place for a clustered index for ms sql 2008?

Answer 1

+2 A:

According to Kimberly L. Tripp's Blog, a clustered index should be:

unique
narrow
static
ever increasing

So, let's evaluate your proposed 'revision number' in light of these criteria.

unique - This will depend on your answer to Ed Harper's comment. If it's not unique on it's own, it seems like the combination of revision number + file would be.
narrow - Assuming revision is something like an integer, you're OK here. If you need to go to revision number + file for uniqueness, and if the file has an ID that's another integer, you're still OK.
static - It sounds like a revision is never changed once it is created, so you're good here.
Ever increasing - I'm reading between the lines a little bit now, but I think your new revisions are probably created in this way.

In conclusion, depending on the uniqueness of revision numbers, it seems like either revision number or revision number + file ID would be a good choice for the clustered index.

Joe Stefanelli 2010-09-20 13:33:57

Please see my reply above. It is not unique and the revision number is an int.

MSSQL 2010-09-20 13:35:41

Then, assuming you have a file ID (integer) to work with, I'd cluster on Revision Number + File ID.

Joe Stefanelli 2010-09-20 13:42:37

Answer 2

A:

On top of Joe Stefanelli's answer, I'd add:

How is the table being used?
Is it merely a dump or log of activity, is it used for OLTP purposes (lookup a few rows at a time), is it used for OLAP-like activity (read many, many rows at a time)?
Is performanace critical (must retrieve rows in microsecond time) or secondary (say for end-of day reports)?

Since you only get one clustered index, I'd tailor the clustered index based on these answers so that is best supports system requirements. Some ideas:

If it's a daily log that is rarely if ever queried, a clustered index on only RevisionNumber would be adequate.

If you report on all files loaded on a given day, the clustered index on RevisionNumber would be ideal.

If you need to look up individual files with any kind of frequency or expediency, that index would suck since, if I've got it right, there'd be 100,000,000 rows (files) for each RevisionNumber -- but a simple non-clustered index on FileName, or FileName + RevisionNumber, would cover that (but see the next idea).

For fast lookups, FileName, FilePath, or FilePath + FileName could be painfully long strings to index. Adding a column (or a persisted calculated column) for checksum(FileWhatever) and indexing on that could save a lot of time. Queries would have to look something like:

SELECT FullFileName, Plus, Other, Columns
 from FileTable
 where RevisionNumber = @TargetRevision
  and ChecksumColumn = checksum(@TargetFullFileName)
  and FullFileName = @TargetFullFileName

Lastly, if you're really adding and dropping about 100,000,000 rows every day, I'd serious look into table partitioning, with the partition based on RevisionNumber.

Philip Kelley 2010-09-20 14:17:31

ansaurus

tags:

views:

answers:

Is this a good place for a clustered index for ms sql 2008?

related questions