views:

67

answers:

3

Hi Guys! I want to create a 'google analytics' type application for the web - i.e. a web-based tool to do some reporting and graphing for my database. The problem is that the database is HUGE, so I can't do the queries in real time because they will take too long and the tool will be unresponsive.

How can I use a cron job to help me? What is the best way to be able to make my graphs responsive? I think I will need to denomalize some of my database tables, but how do I make these queries faster? What intermediate values can I store in another database table to make it quicker?

Thanks!

A: 

A simple way to approach this would be to create a selection of summary tables to contain pre-aggregated data. These could be populated on a regular basis using crontab.

Alternatively, it may be worth looking at something like http://mondrian.pentaho.org.

Richard
A: 

I would get myself familiarized with "star schemas"

epitka
+1  A: 

Business Intelligence (BI) is a pretty mature discipline - and you'll find answers to your questions in any book on scaling databases for reporting & data warehousing.

A high-level list of tactics would include:

  • partitioning (because indexes are little help for most reporting)
  • summary tables (generated usually through a batch process submit via cron)
  • you need a good optimizer (some databases like mysql don't - so make poor joining decisions)
  • query parallelism (some databases will provide linear speedups just by splitting your query into multiple threads)
  • star-schema - a good data model is crucial to good performance

In general dynamic reporting beats the pants off static reporting - so if you're after powerful reporting I'd just try to copy data into an appropriate model, use aggregates, possibly change the database to get a good optimizer and the appropriate features rather than run reports in batch.

KenFar