tags:

views:

56

answers:

4

I am working on a search application that uses a form with 16 filter options that are either 1 (selected) or 0 (not selected). The result as JSON is retrieved via AJAX using a GET request.

The query string then looks like this:

filter_1=0&filter_2=1 ...omitted... &filter_16=1&page=20

Each searchresult has at least 2 pages which can be browsed by the user.

My question is: how can I cache the searchresults based on the input parameter? My first idea was to md5 the requestparameters and then write a cache file using the hash as filename.

Every time a new request comes in, I search for the cache file and if it is there, then use the data from that file instead of querying the database and converting the rows to a json result.

But this seems not like a good idea because of the many search options. There would be quite a lot cache files (16 * 16 ???), and because the application is only used by a few users, I doubt that all possible combinations will ever get cached. And each result contains X pages, so each of that page would be a cache file of its own (16 * 16 * X).

What would be a good caching strategy for an application like this? Is it acutually possible to implement a cache?

+1  A: 

Why do you need the cache?

If the app is only used by a few users then caching may not actually be required.

Toby Hede
A: 

Given the requirements you describe (small number of users), it seems to me that caching all combinations seems reasonable. Unless, of course, caching makes sense at all. How much time does a typical query take? Since you say that the application will be used only by several people, is it even worth caching? My very rough estimate is that if the query does not take several seconds in this case, don’t worry about caching. If it is less than a second, and you really don’t want to make the application super responsive, no caching should be needed.

Otherwise, I would say (again given the small number of users) that caching all combinations is OK. Even if very large number of them was used, there is still at most 65536 of them, and many modern operating systems can easily handle thousands of files in a directory (in case you plan to cache into files). But in any case, it would be reasonable to limit the number of items in the cache and purge the old regularly. Also, I would not use an MD5, I would just concatenate the zeros and ones from your filters for the cache key (e.g. 0101100010010100).

Jan Zich
+1  A: 

Because all of your search parameters are flags that can be either 0 or 1, you might consider bitmasking.

Each of your filters would represent a value that is a power of 2:

$filter_1 = 1;
$filter_2 = 2;
$filter_3 = 4;
...
$filter_8 = 256;
...
$filter_16 = 65536;

By using PHP's bitwise operators, you can easily store all 16 filter values in a single integer. For instance, the value "257" can only be reached using a combination of filter_1 and filter_8. If the user selected filter_1 and filter_8, you could determine the bitmask by doing:

$bitmask = $filter_1 | $filter_8  //gives 257

With a unique bitmask representing the state of all your filters, you can simply use that as your cache key as well, with no expensive md5 operations needed. So in this case, you would save a file named "257" into your cache.

This technique gives you an easy tool to invalidate your cache with as well, as you can check new and updated records to determine which filters they match, and delete any file that has that "bit" set in the name, ie. if ( ((int)$filename) & $filter == $filter) unlink($filename);. If your tables have frequent writes, this could cause some performance issues for scanning your cache, but it's a decent technique for a read-heavy application.

This is an approach I love to use when dealing with bits or flags. You should consider carefully if you really need caching like this however. If you only have a few users of the system, are you really going to be having performance problems based on a few search queries? As well, MySQL has built-in query caching which performs very well on a high-read application. If your result page generation routines are expensive, then caching the output fragments can definitely be beneficial, but if you're only talking about microseconds of performance here for a handful of users, it might not be worth it.

zombat
A: 

First verify you actually need a cache (like Toby suggested).

After that, think about how fresh the information needs to be - you'll be needing to flush out old values. You may want to use a preexisting solution for this, such as memcached.

$key = calc_key();

$result = $memcache->get($key);

if (!$result) {
  $result = get_data_from_db();
  /* cache result for 3600 seconds == 1 hour */
  $memcache->set($key, $result, 0, 3600);
}

/* use $result */
orip