tags:

views:

553

answers:

4

I've read several guides on implementing a php cache system (my site is custom coded, fairly query heavy and growing) including this one: http://www.snipe.net/2009/03/quick-and-dirty-php-caching/

I understand them fully but there are certain parts of the page that I can't cache, what's the best way to go about doing that?

+2  A: 

Why can't you cache them? If it's because they change very rapidly, then you're probably better off trying to reduce the retrieval and rendering overhead.

If the contents vary according to the user, then you can offer up different cached versions per user. If you're using an object cache, for example, then include the user's identifier or some other unique value in the cache key. If you're using a more generic, HTTP-level cache such as Squid, then you can set the Vary header to, e.g. Vary: Cookie, which will cause the proxy to double-check who it's serving content to.

You should still instruct caches and proxies not to store any sensitive information in public caches.

Rob
I can't cache the whole page because it does referal logging etc using PHP, I suppose I could put that above ob_start but also I'm going to have a toplist which either needs to not be cached or have a seperate (shorter cache)..
zuk1
+2  A: 

caching is useful if you do a lot of reads but seldom update. the more often the data in the database changes, the more problematic a caching system becomes. caching does add a certain complexity to your codebase, which can be a pain to handle. and it can even slow your site down in the worst case.

the most important question is:
when do you have to invalidate your cache? when does it become stale? in most of the cases, if the database-query returns different rows than at the time you cached that page. but how do you know that? you don't (maybe there is a way, but i can't think of any atm), because to check that, you probably have to query the result to compare.

what you can do:

  1. clear all your cache everytime the relevant parts of the database are updated
    this is indeed possible if your database only rarely gets updated - hourly, daily, weekly. but it's useless if changes are coming in continually. that's the case with most web projects.

  2. clear cached items after something happens
    this only works if changes do not have to be reflected instantly (e.g. doesn't matter if there's incorrect data for some time). in this case you simply could clear the cache for a certain item if it's older than X minutes, or more than Y pageviews happened.

  3. clear only the relevant pieces
    here you have to figure out which parts of the cache are affected when you're updating the database. if done right, changes are reflected instantly while performance improves.

most likley is option 3: you have to find out. so, as an example, lets take the classic case of a weblog, consisting of a frontpage, archive pages and a detail-page for every entry.

changes are introduced by: the admin-panel (crud for entries) and comments

if an entry gets edited or deleted, you have to clear the cache for:

  • frontpage, if the entry was new
  • the relevant archive page, if the entry was old
  • the detail-page for the entry

if someone commentes you just have to clear the detail-page, but only if the number of comments is not displayed in the index or archive. otherwise, same as entry-crud.

if something sitewide is changed, the whole cache has to be cleared (bad!)

now, lets think about entry-crud and the archive. if the archive is of the type "one page per month", then clear the month the entry belongs to. but if the archive is kind of entry 1-10, 11-20, 21-30, ... most likley the whole archive-cache has to be rebuild.

and so on ...

some of the problems:

  • if you don't identify all the affected pieces correctly, it can lead to stale data and/or (un-)dead links.

  • if updates happen too often, building the cache is additional work, because when the next pageview happens, the cache is most probably stale again and has to be rebuild anyway.

  • some parts of the page are unfit for caching, e.g. the (custom) search function. if the cache works elsewhere everything is fast and great, but searching is still awfully slow.

  • it can be problematic if you have to clear the whole cache while lots of requests are happening. it then can choke your server, because a cache-miss is normally more expensive than if the page's not cached in the first place. even worse, if 3 request are coming in, and the first request can't cache the page before the other two are handled, the cache gets requested 3 times instead of once.

my advice:

  • optimize your database. keys and config ok? maybe it works without caching.

  • optimize your queries. "explain select"!

  • only cache parts of the page - the expensive ones. fill in small, cheap changes with str_replace and placeholders

  • if everything works, use apc or memcached instead of files (files usually work great, but apc/memc are faster). you can also use your database to cache your database, often that works great!

  • are you building a lazy or an eager caching system? lazy means: build the cache when the page's first requested, eager means: right after the update.

meh, i don't have any real advice for you. depends too much on the problem :)

update

theres a request for the blog-entry with key 256. it shows the blog-entry, the comments and who is currently logged in. it's expensive to query the entry and the comments, and format all the text and everything. the currently logged in user resides in the session.

first, create a unique key for the part you want to cache. in this case, the cache-key probably is the database id of the entry (with some prefix and postfix).

so, the cached file should have the name cache/blogentry_256.tmp. check, if that file exists.

  1. if it doesn't exist, do all the expensive querying and formatting, leave a placeholder (e.g. {username}) where the name of the current user should be and save the result to cache/blogentry_256.tmp. be careful not to write any data into this file that shouldn't be displayed for everyone or changes on every request.

  2. now, read the file (or reuse the data from 1) and str_replace the username into the placeholder. echo the result.

if an entry gets changed or someone comments, you have to delete the cache-file with the entries id.

this is lazy caching - the cache is built only if the user request the page. be careful - if a user types {username} into a comment, it's inserted there too! that means, you have to escape your cached data and unescape it after str_replacing. this technique works with memcached or apc too.

problems: you have to build your design around that caching desicion. e.g. if you want to display "comment posted 5 minutes ago" instead of "commented added May 6th, 3:42pm", then you're in trouble.

Schnalle
I simply want to cache most of the page for a week and other parts never at all.. So I take it from ~12 queries to 1 or 2 while ALL the functionality still remains.
zuk1
A: 

Here's my tip: create either an object or function that lets you cache a "section" based upon its name. This way you can cache part of your page, and include the rendering section within an if block. What I did was hash the incoming text and used that as the filename within './cache', and returned the boolean value of whether or not regeneration is needed; you will obviously need output buffering for this.

This would give you a cache framework of,

if(Cache::cached('index-recent-articles', 5 /* minutes */)) {
   Cache::start();
   echo 'stuff here';
   Cache::stop('index-recent-articles');
} // And if Cache::cached could echo the cached HTML when the result is false...
  // then this is a tidy else-less bit of code.

I don't know if this is optimal, a server-based solution like memcache would be better, but the concept should help you. Obviously how you manage the cache, whether by files or by database or extension, is up to you.

Edit:

If you wish for file-based caching then a system this simple may be all you need. Obviously I haven't tested it in your shoes, and this is mostly just pulled from the back of my head, but as it stands this is a decent start for what you may desire.

abstract class Cache {
  const path = 'cache/';
  static function start()  { ob_start(); }
  static function end($id, $text = null) {
    $filename = sprintf('%s%u', Cache::path, crc32($id));
    file_put_contents($filename, $text === null ? ob_get_clean() : $text);
  }
  static function cached($id, $minutes = 5) {
    $filename = sprintf('%s%u', Cache::path, crc32($id));
    $time = $minutes * 60;
    if(time() - filemtime($filename) > $time) {
      return true;
    } else {
      echo file_get_contents($filename);
      return false;
    }
  }
}
The Wicked Flea
A: 

Remember that caching isn't a single solution - Your application can have caching on different levels, depending on the scope of variables. Within a single request you may have an identity-map to prevent additional queries for the same data. You can use memcached to cache data across requests. The database itself has a query cache, as well as other types of lower-level caches. Your rendering engine can also cache at different levels - The whole page, individual parts of the page etc. You need to figure out what you can cache and make a strategy for that.

Also, as with all optimisation, make sure that you measure before you tweak anything. This will ensure that 1) you actually improve things, rather than making them worse and 2) You focus on the stuff that matters, rather than everything else.

troelskn