tags:

views:

74

answers:

4

First of all, the website I run is hosted and I don't have access to be able to install anything interesting like memcached.

I have several web pages displaying HTML tables. The data for these HTML tables are generated using expensive and complex MySQL queries. I've optimized the queries as far as I can, and put indexes in place to improve performance. The problem is if I have high traffic to my site the MySQL server gets hammered, and struggles.

Interestingly - the data within the MySQL tables doesn't change very often. In fact it changes only after a certain 'event' that takes place every few weeks.

So what I have done now is this:

  1. Save the HTML table once generated to a file
  2. When the URL is accessed check the saved file if it exists
  3. If the file is older than 1hr, run the query and save a new file, if not output the file

This ensures that for the vast majority of requests the page loads very fast, and the data can at most be 1hr old. For my purpose this isn't too bad.

What I would really like is to guarantee that if any data changes in the database, the cache file is deleted. This could be done by finding all scripts that do any change queries on the table and adding code to remove the cache file, but it's flimsy as all future changes need to also take care of this mechanism.

Is there an elegant way to do this?

I don't have anything but vanilla PHP and MySQL (recent versions) - I'd like to play with memcached, but I can't.

+4  A: 

There are only two hard things in Computer Science: cache invalidation and naming things.

—Phil Karlton

Sorry, doesn't help much, but it is sooooo true.

DanSingerman
It does help, you gave me a keyword to search with :)
Rew
+1 never heard that before, but hits home for sure.
snicker
+4  A: 

Ok - serious answer.

If you have any sort of database abstraction layer (hopefully you will), you could maintain a field in the database for the last time anything was updated, and manage that from a single point in your abstraction layer.

e.g. (pseudocode): On any update set last_updated.value = Time.now()

Then compare this to the time of the cached file at runtime to see if you need to re-query.

If you don't have an abstraction layer, create a wrapper function to any SQL update call that does this, and always use the wrapper function for any future functionality.

DanSingerman
I don't have an abstraction layer. But your post made me realise, for every single table in the database I've created an Updated column which uses an automatic timestamp. So at least I can do a simply query to check the freshness of the data and delete the cache file accordingly. It's not the silver bullet I was looking for, but it's quite shiny.
Rew
Actually, I have an abstraction layer of sorts, at least a method which deals with all queries. I can check if the query is one of UPDATE/INSERT/DELETE and if so, after the query runs, I could delete the relevant cache file. Not great, but it should achieve what I want with a degree of robustness.
Rew
+1  A: 

You have most of the ends covered, but a last_modified field and cron job might help.

There's no way of deleting files from MySQL, Postgres would give you that facility, but MySQL can't.

partoa
Yep - last_modified, or in my case an Updated column (that auto-updates) seems to be a good way to go.
Rew
A: 

You can cache your output to a string using PHP's output buffering functions. Google it and you'll find a nice collection of websites explaining how this is done.

I'm wondering however, how do you know that the data expires after an hour? Or are you assuming the data wont change that dramatically in 60 minutes to warrant constant page generation?

Lachlan McDonald
The 60 minute limit means there will only be 1 request every 60 minutes that does the heavy work and rebuilds the cache file. It was arbitrary, I guess what could be done is instead of rebuilding the cache file, it could check if the data really has changed and if not just update the timestamp on the cachefile so it becomes fresh again.
Rew