views:

214

answers:

4

I use SQL Server 2005. I have a simple logging table that my web application uses to track user activities and urls visited. The table design is very simple

ID (identity value),

LogDate (datetime),

Activity (nvarchar(200)),

Url (nvarchar(1000))

We mainly do Inserts into this table. Once in a while, our perform some queries against this table if we want to investigate a particular user's activities over a date period. The table currently has an identity column as its primary key. This is also its clustered index.

I'm wondering if it is better for me to change its clustered index into the LogDate column. The LogDate column stores the date/time of the activity and can have duplicates, but since we are always inserting into the table, new records should be always at the end of the table, so there is no reason for SQL Server to have to regorganise or do page splits that would impact Insert performance. Having the LogDate column as the clustered index should also help search performance.

Please let me know if my reasoning is correct. Thank you!

A: 

I think the best option is to use the "Database Engine Tuning Wizard". This will take care of selecting the best choice for your case. You just need to run it long enough to cover your different cases in order for its analysis to best match your real life case.

You can find more at http://msdn.microsoft.com/en-us/library/ms189303%28SQL.90%29.aspx (for SQL server 2005)

Emad
The DTA does a reasonable job, but it will occasionally suggest the 'wrong' indexes.
Mitch Wheat
+2  A: 

Yes, your reasoning is correct, provided the rate of inserts is much less than the granularity of datetime (3.33ms)

SQL Server 2008 has a new datatype, DATETIME2, with higher precision (100 nanoseconds).

If you leave a reasonable amount of free space (FILLFACTOR between 80-90), and rebuild the index regularly (once a week) all should be well.

Mitch Wheat
A: 

The clustered index must be unique, so you have two options if you want a LogDate searchable index (I believe you do.)

1) use "id" as the clustered index, and add a nonclustered index on LogDate.

2) use "LogDate, id" as a clustered index (guaranteed uniqueness on the basis of id)

If you have a column such as id, one would generally expect it to be indexed, so even in the case of (2) you may need a nonclustered index on id.

I wouldn't worry too much about the inefficiency on insert times of adding an additional nonclustered index, but if you are optimising for a very high frequency of transactions then maybe you might have to consider this.

One other thing to note with log tables is that if all transactions (all writes to all tables) write to a single audit log table, then you are effectively creating a write mutex on the database, such that all transactions may have to be serialised on write locking that table.

Note that row-level locking and read committed/snapshot isolation modes reduce these factors, particularly for readers...

polyglot
The following is incorrect: "The clustered index must be unique". Read the following article discussing non-unique indexes and uniquifiers: http://www.sqlskills.com/blogs/paul/post/Indexes-From-Every-Angle-What-happens-to-non-clustered-indexes-when-the-table-structure-is-changed.aspx
AlexKuznetsov
Quoting from that article: "Non-unique clustered index: Every record in a clustered index HAS to be unique, otherwise there would be no way to deterministically navigate to a particular record using the index b-tree. In a non-unique clustered index, SQL Server has to add a special column, called the uniquifier, to each record, so that if multiple records have the same cluster key values, the uniquifier column will be the tie-breaker. This uniquifier column is added as part of the cluster key, and so it is also present in all non-clustered index records as part of the logical RID.
AlexKuznetsov
+1  A: 

Before choosing clustering indexes, we need to set priorities. What is more important: speeding up infrequent selects or minimally slowing down frequent inserts? If your inserts are more important, keep the existing clustering index.

AlexKuznetsov
Thanks your advice. This is something that I should also consider.So, what you're saying is that if I keep the ID as my clustered index, I will get slightly better Insert performance than if I chset the clustered index to LogDate?
janem
@janemoreno. Yes, because inserting a row with duplicate LogDate will require a uniquifier (if it is clustered on LogDate), it will be somewhat slower.
AlexKuznetsov