views:

66

answers:

5

I have a SQL Server table with the following structure:

CREATE TABLE [dbo].[Log](
 [LogID] [bigint] IDENTITY(1,1) NOT NULL,
 [A] [int] NOT NULL,
 [B] [int] NOT NULL,
 [C] [int] NOT NULL,
 [D] [int] NOT NULL,
 [E] [int] NOT NULL,
 [Flag1] [bit] NOT NULL,
 [Flag2] [bit] NOT NULL,
 [Flag3] [bit] NOT NULL,
 [Counter] [int] NOT NULL,
 [Start] [datetime] NOT NULL,
 [End] [datetime] NOT NULL)

The table is used to log activities. Columns A-E represent foreign keys, Flag1-Flag3 indicate certain log states, and columns Start and End mark beginning and end of an activity.

On average this table is updated every ~30sec and update makes ~50 inserts/updates.

User can make a query from UI and filter data on any given column and all combinations of columns and column types.

What would be the best way to optimize data retrieval for this table:

  1. Create one "master" index that would hold all these columns
  2. Identify some of the most used filter combinations e.g. [A,D,E], [A, Start, End] etc. and create indexes for them
  3. Something else...
A: 

One approach is to let SQL Server tell you the optimal usage. Run a trace for a few min while the table is under "typical" usage, and then run the Database Engine Tuning Advisor

Scott Weinstein
... and it returns 0 recommendations! :))))
Toni Frankola
+2  A: 

Log tables are rarely indexed, because indexing slows down INSERT, UPDATE, and DELETE statements.

I would recommend either:

  • loading the records into a table (temporary or actual, indexed) before filtering
  • using an indexed view

Basically - if speed/performance is a big concern, index the records in another form of table so the logging isn't impacted.

OMG Ponies
+3  A: 

I doubt anyone here can make anything but a guess - you need to record the usage of the table and see from that usage what combinations of columns are being queried for.

  1. Create one "master" index that would hold all these columns

That's definitely not a good idea - if you have an index on (A,B,C,D,E) and you restrict your query by values of B and D, that index is totally useless. It's only useful

  • if you query by all five columns frequently
  • by combinations like (A,B), (A,B,C), (A,B,C,D) frequently

In any other case, it's a waste - don't use this.

  1. Identify some of the most used filter combinations e.g. [A,D,E], [A, Start, End] etc. and create indexes for them

Yes, that's really the only way that promises any success. You need to see what kind of queries actually happen, and then tweak for those.

marc_s
+1  A: 

In any index combinaiton, the inner keys cannot be used unless the outer key is also referenced. Say you have an index on (A,B,C,D):

  • WHERE A=@a AND B=@b AND C=@c AND D=@d will make full use of the index
  • WHERE A=@a may use the index to filter the range of rows to scan. Same for WHERE A=@a AND B=@b, WHERE A=@a AND C=@c etc. Any combination that has the leftmost column (A) in it may use the index.
  • WHERe B=@b cannot use the index. Nor WHERE C=@c, WHERE D=@d and any other combination that is misisng A. In other words, if column A is not in the query restrictions, the index is unusable.

These are the very basic rules. Add to this that JOIN conditions may, or may not, be considered the same as WHERE clauses. And for larger results non-covering indexes may hit the tipping point. And indexes can satisfy not only search conditions, they may also help with ORDER BY clauses. Actual indexes to create depend a lot on your query pattern, I/O capabilities, update load and not least the data size management overhead (impact of size of files and backups). The engine will give you hints on what indexes could be used for queries (the Missing Indexes feature) but the engine will by no way balance the benefits of the index with the cost of one extra index (I/O, updates performance, size of data). There are Index Design Guidelines that are quite good, but of course, you have to go through reading them. Ultimately, choosing proper indexes depends on so many factors and considtions that is imposisble to give a cookie cutter answer.

Remus Rusanu
A: 

I Would place an index on start (datetime) and that's all, going on the assumption that few queries against the log will be inception-to-date and most will be from a starting point forward.

Tim