views:

152

answers:

3

Ok, I am building a fairly large statistics system, which needs to allow users to requests statistics for a given set of filters (e.g. a date range).

e.g. This is a simple query that returns 10 results, including the player_id and amount of kills each player has made:

SELECT player_id, SUM(kills) as kills
FROM `player_cache`
GROUP BY player_id
ORDER BY kills DESC
LIMIT 10
OFFSET 30

The above query will offset the results by 30 (i.e. The 3rd 'page' of results). When the user then selects the 'next' page, it will then use OFFSET 40 instead of 30.

My problem is that nothing is cached, even though the LIMIT/OFFSET pair are being used on the same dataset, it is performing the SUM() all over again, just to offset the results by 10 more.

The above example is a simplified version of a much bigger query which just returns more fields, and takes a very long time (20+ seconds, and will only get longer as the system grows).

So I am essentially looking for a solution to speed up the page load, by caching the state before the LIMIT/OFFSET is applied.

Thanks!

+1  A: 

Have you considered periodically running your long query and storing all the results in a summary table? The summary table can be quickly queried because there are no JOINs and no GROUPings. The downside is that the summary table is not up-to-the-minute current.

I realize this doesn't address the LIMIT/OFFSET issue, but it does fix the issue of running a difficult query multiple times.

dnagirl
Unfortunately this is not possible because of the query filters. Different filters MUST perform the long query again, which I'll have to live with, but I am just trying to fix the LIMIT/OFFSET issue. The only way I could use this would be to have multiple summary tables for each type of filter, which is limitless as date ranges could be anything..
Twelve-60
There must be some way to properly design your summary table so it would match your filter needs. (probably by adding more fields to the summary table for all relevant filters).
Saggi Malachi
How? With a date range filter, the amount of possible queries is limitless!
Twelve-60
There is no general version of the long query that leaves off the date filtering? That is what should populate the summary table. If the *only* thing that is making the long query slow is the date filtering then, yes, a different approach is required.
dnagirl
The way to handle summary values for date range filters is to pre-calculate different summary values for different points in time. Then, based on your query, you look for the calculated value in your summary and then just add onto it the deltas since that time. The date range for the deltas will be much smaller and your SUM will process much more quickly.
Bernard Chen
A: 

Depending on how often the data is updated, data-warehousing is a straightforward solution to this. Basically you:

  1. Build a second database (the data warehouse) with a similar table structure
  2. Optimise the data warehouse database for getting your data out in the shape you want it
  3. Periodically (e.g. overnight each day) copy the data from your live database to the data warehouse
  4. Make the page get its data from the data warehouse.

There are different optimisation techniques you can use, but it's worth looking into:

  1. Removing fields which you don't need to report on
  2. Adding extra indexes to existing tables
  3. Adding new tables/views which summarise the data in the shape you need it.
vincebowdren
+2  A: 

You can of course use caching, but i would recommend caching the result, not the query in mysql.

But first things first, make sure that a) you have the proper indexing on your data, b) that it's being used.

If this does not work, as group by tends to be slow with large datasets, you need to put the summary data in a static table/file/database.

There are several techniques/libraries etc that help you perform server side caching of your data. PHP Caching to Speed up Dynamically Generated Sites offers a pretty simple but self explanatory example of this.

Peter Lindqvist