caching is useful if you do a lot of reads but seldom update. the more often the data in the database changes, the more problematic a caching system becomes. caching does add a certain complexity to your codebase, which can be a pain to handle. and it can even slow your site down in the worst case.
the most important question is:
when do you have to invalidate your cache? when does it become stale? in most of the cases, if the database-query returns different rows than at the time you cached that page. but how do you know that? you don't (maybe there is a way, but i can't think of any atm), because to check that, you probably have to query the result to compare.
what you can do:
clear all your cache everytime the relevant parts of the database are updated
this is indeed possible if your database only rarely gets updated - hourly, daily, weekly. but it's useless if changes are coming in continually. that's the case with most web projects.
clear cached items after something happens
this only works if changes do not have to be reflected instantly (e.g. doesn't matter if there's incorrect data for some time). in this case you simply could clear the cache for a certain item if it's older than X minutes, or more than Y pageviews happened.
clear only the relevant pieces
here you have to figure out which parts of the cache are affected when you're updating the database. if done right, changes are reflected instantly while performance improves.
most likley is option 3: you have to find out. so, as an example, lets take the classic case of a weblog, consisting of a frontpage, archive pages and a detail-page for every entry.
changes are introduced by: the admin-panel (crud for entries) and comments
if an entry gets edited or deleted, you have to clear the cache for:
- frontpage, if the entry was new
- the relevant archive page, if the entry was old
- the detail-page for the entry
if someone commentes you just have to clear the detail-page, but only if the number of comments is not displayed in the index or archive. otherwise, same as entry-crud.
if something sitewide is changed, the whole cache has to be cleared (bad!)
now, lets think about entry-crud and the archive. if the archive is of the type "one page per month", then clear the month the entry belongs to. but if the archive is kind of entry 1-10, 11-20, 21-30, ... most likley the whole archive-cache has to be rebuild.
and so on ...
some of the problems:
if you don't identify all the affected pieces correctly, it can lead to stale data and/or (un-)dead links.
if updates happen too often, building the cache is additional work, because when the next pageview happens, the cache is most probably stale again and has to be rebuild anyway.
some parts of the page are unfit for caching, e.g. the (custom) search function. if the cache works elsewhere everything is fast and great, but searching is still awfully slow.
it can be problematic if you have to clear the whole cache while lots of requests are happening. it then can choke your server, because a cache-miss is normally more expensive than if the page's not cached in the first place. even worse, if 3 request are coming in, and the first request can't cache the page before the other two are handled, the cache gets requested 3 times instead of once.
my advice:
optimize your database. keys and config ok? maybe it works without caching.
optimize your queries. "explain select"!
only cache parts of the page - the expensive ones. fill in small, cheap changes with str_replace and placeholders
if everything works, use apc or memcached instead of files (files usually work great, but apc/memc are faster). you can also use your database to cache your database, often that works great!
are you building a lazy or an eager caching system? lazy means: build the cache when the page's first requested, eager means: right after the update.
meh, i don't have any real advice for you. depends too much on the problem :)
update
theres a request for the blog-entry with key 256. it shows the blog-entry, the comments and who is currently logged in. it's expensive to query the entry and the comments, and format all the text and everything. the currently logged in user resides in the session.
first, create a unique key for the part you want to cache. in this case, the cache-key probably is the database id of the entry (with some prefix and postfix).
so, the cached file should have the name cache/blogentry_256.tmp
. check, if that file exists.
if it doesn't exist, do all the expensive querying and formatting, leave a placeholder (e.g. {username}) where the name of the current user should be and save the result to cache/blogentry_256.tmp
. be careful not to write any data into this file that shouldn't be displayed for everyone or changes on every request.
now, read the file (or reuse the data from 1) and str_replace the username into the placeholder. echo
the result.
if an entry gets changed or someone comments, you have to delete the cache-file with the entries id.
this is lazy caching - the cache is built only if the user request the page. be careful - if a user types {username} into a comment, it's inserted there too! that means, you have to escape your cached data and unescape it after str_replacing. this technique works with memcached or apc too.
problems: you have to build your design around that caching desicion. e.g. if you want to display "comment posted 5 minutes ago" instead of "commented added May 6th, 3:42pm", then you're in trouble.