ansaurus

Question

"Indexing" (aka maintaining a table of) aggregate data in SQL Server 2005

Answer 1

A:

First off, would not an index on the Token column be sufficient? That way, given the Token value, the SQL query optimizer would scan just that part of the index that contains the row you are interested in. Make this a clustered index, and you'd get optimal performance.

Next up, how do you know which Token value you are interested in aggregating? There is no datetime (or timestamp) column listed, and the Token values appear to be randomly assigned (as opposed to some form of ascending value), so I'd guess that you know the Token value to be aggregated before you issue the query--in which indexing should do what you want. If the values are unknown but somehow ascending, there are a number of tactics you can use to first determine the most recent X Token values, and once you've got that/those Tokens you're back to the partial table scan.

Philip Kelley 2009-09-21 13:49:55

The token value is sufficient if I wanted only 1 token, however I want to query based on aggregate values on all tokens - namely I want the top few tokens when sorted by an aggregate value. This will normally require a table scan as SQL server must calculate the aggregate value for all tokens before it can sort them.

Kragen 2009-09-21 14:36:32

Answer 2

+1 A:

How about potentially a view based on the aggregates, possibly even an indexed view. I have not done very much with indexed views but this article talks about using them with complex aggregates like AVG(). Maybe it will get you in the correct direction.

http://msdn.microsoft.com/en-us/library/aa933148%28SQL.80%29.aspx

Gratzy 2009-09-21 14:01:04

Thankyou very much, I hadnt realised that there were substitue aggregate functions for indexed views.

Kragen 2009-09-21 14:39:24

I ended up sorting this with a combination of indexed views and tables of aggregated data maintained with triggers and scheduled jobs.

Kragen 2009-09-24 22:28:51

Answer 3

A:

I'll take another stab at this one, now that I understand it better. This is a not uncommon reporting problem that leads to data warehousing solutions, like so: can you just add in a second table containing pre-aggregated data? This is indeed denormalized/redundant data... but it seems clear and well defined, and serves the needs of the business. A couple of wrinkles on this idea:

If the data is entered only once, could you modify the data entry routine to add the aggregated row at the same time. If it trickles in over time, you'd need the recurring process to "sweep things up". I avoid triggers on general principle; they might help here, but they could also tie up your system depending on usage patterns and data interrelationships.

How up-to-date must the data be? Can the aggregated data be out of synch with the detailed data, and if so, by how long? You could have a SQL Agent job in place that runs every day/hour/5 minutes to scan for recent entries and update the aggregated table. (Add an indexed "last entered" column, and these updates could be quick.) The trade-off is the period of time your data would be off. (But that datetime column could indicate "up through when" the data is accurate. Maybe you don't make the aggregated data available for after that point in time?)

If the data does not change once it is entered (no updates, no deletes, no late-arriving rows), this could work -- but if you have to maintain updates over time, and the aggregated data changes had to be available at the same time as the data is entered, it could be a nightmare to maintain.

Philip Kelley 2009-09-22 14:13:35

ansaurus

tags:

views:

answers:

"Indexing" (aka maintaining a table of) aggregate data in SQL Server 2005

related questions