views:

280

answers:

4

I have a php web application where certain data changes on a weekly basis but is read very frequently often.

The SQL queries that retrieve the data and the php code for html output are fairly complex. There are multiple table joins, and numerous calculations - but they result in a fairly basic html table. Users are grouped, and the table is the same for each group each week, but different for different groups. I could potentially have hundreds of tables for thousands of users.

For performance reasons, I'd like to cache this data. Rather than running these queries and calculations every time someone hits the page, I want to run a weekly process to generate the table for each group giving me a simple read when required.

I'd be interested to know what techniques you've used successfully or unsuccessfully to achieve something like this?

Options I can see include:

  • Storing the html result of the calculations in a MySQL table, identified by user group
  • Storing the resultant data in a MySQL table, identified by user group (difficult as there's no fixed number of data items)
  • Caching the page output in static files

Any other suggestions would be welcome!

+3  A: 

There are indeed a few options:

  • Prerender the pages on a weekly basis and then serve them "statically".
  • Use a cache (e.g. Squid) to cache such responses on a first-chance basis for a week. For example, you can configure the caching policy so requests that go to a particular page (e.g. very_long.php?...) are cached separately from the rest of the website.
  • Make sure you turn on DB caching. MySQL has caching of its own and you can fine tune it so that repeated long queries are not recalculated.
Assaf Lavie
Some great suggestions, thanks.
Damovisa
Good list of suggestions here. Note that MySQL query caching is not a complete fix-all - it has benefits and drawbacks (it can make things worse on frequently-updated data). As is suggested by Assaf, it is only one of many options.
thomasrutter
+2  A: 

first of all, profile. verify that those queries are really consuming a significant amount of time. maybe MySQL query result caches has already done the work for you.

if they are really consuming resources, what i would do is to create a table with the computed results, and a procedure that do all needed managing, to be called when the data changes. those frequent reads should go only to the pre-computed data, without bothering to check if it's still valid.

simply add some hooks to the procedures that modify the base data, or database triggers if you can, these would be executed unfrequently (weekly?), and could take a lot of time to generate any results.

Javier
I think this is the first time I've seen where a question has four answers and they are all good ones (all deserving an up-vote)! Good suggestions here. The need to profile before you optimize is a good point.
thomasrutter
True - I was pretty impressed!
Damovisa
+3  A: 

In the function to generate the table, make it store the result to a file on disk:

/cache/groups/1.txt
/cache/groups/2.txt

You don't necessarily have to run a weekly batch job for it, when calling the function to get the data, check if the cache is out of date (or non-existent). If so, generate and cache the results then. If not, just return the cached file.

function getGroupTable($groupId) {
    if (cacheIsStale($groupId)) {
        generateCache($groupId);
    }
    return file_get_contents($cacheFile);
}

The cacheIsStale() function could just look at the file's timestamps to test for freshness.

nickf
This is how I handled such a problem long ago.
Loren Pechtel
It's a good solution and probably one I'll use. I'm just concerned with the potential (future) need to use the data generated rather than just the html.
Damovisa
in that case, you could perhaps store the results from MySQL in a different, more accessible format. CSV would work easily (the fgetcsv function helps here), or you could even put the result into an array and then serialize it. Of course, you could do two cache files: HTML as well as raw data.
nickf
+2  A: 

It seems you already have most of it covered.

One other option, assuming the table data is not huge, is to use memcache to cache the results - this would probably be the faster solution, although you would need to check memory requirements to see if it's a viable option.

jcinacio