views:

211

answers:

3

Hi there,

I need to build a reporting interface to an application I'm working on which requires administrators to visualise huge quantities of collected data over time.

Think something similar to Google Analytics etc.

Most of the data that needs to be visualised sits in a basic table which contains a datetime, 'action' varchar and other filterable data - currently the table holds 1.5M rows, and it's growing every day.

At the moment I'm doing a simple select with the filters applied grouped by day and it's running pretty well, but I was wondering if there's a smarter more efficient way to extract such data.

Cheers

+1  A: 

You can start out doing couple of things:

  1. Make sure you add the indexes on all the filters so they won't do any table scans.

  2. check using query plan analyzer to make sure there are no places that need optimization.

  3. Since you have a datetime stamp in your table, partitioning will definitely help you in the future.

Good luck.

CodeToGlory
A: 

You can expect a number of common queries, probably a small number compared to the number of unique combinations of filters that could be generated. You can use this to "compress" the data into companion tables, and run this collection process at night.

Joel
+1  A: 

1) Two tiers -- raw data, and summarized data. For raw data, indexes will likely be of no help. You are doing aggregations, in most cases that necessitates a full table scan. If it doesn't, reorganize so it does, it'll be faster.

2) Figure out your aggregates, automatically generate them, and run the reports off the aggregate data. Do index these summary tables!

3) Avoid joins. Aggregate, materialize results of the group-bys, then join the aggregated results.

4) Partition. Keep data for one day (or whatever granularity makes sense) separate from data for another day. Make automated table creation scripts if necessary (grown-up -- or feature-heavy, depending on your point of view -- databases give you something called "partitioning" to do this in a more sane way).

5) Read up on "data warehousing" http://en.wikipedia.org/wiki/Data_warehouse

SquareCog