views:

206

answers:

2

I'm using Zend_Cache_Core with Zend_Cache_Backend_File to cache results of queries executed for a model class that accesses the database.

Basically the queries themselves should form the id by which to cache the obtained results, only problem is, they are too long. Zend_Cache_Backend_File doesn't throw an exception, PHP doesn't complain but the cache file isn't created.

I've come up with a solution that is not efficient at all, storing any executed query along with an autoincrementing id in a separate file like so:

0->>SELECT * FROM table 1->>SELECT * FROM table1,table2 2->>SELECT * FROM table WHERE foo = bar

You get the idea; this way i have a unique id for every query. I clean out the cache whenever an insert, delete, or update is done.

Now i'm sure you see the potential bottleneck here, for any test, save or fetch from cache two (or three, where we need to add a new id) requests are made to the file system. This may even defeat the need to cache alltogether. So is there a way i can generate a unique id, ie a much shorter representation, of the queries in php without having to store them on the file system or in a database?

+1  A: 

Strings are arbitrarily long, so obviously it's impossible to create a fixed-size identifier that can represent any arbitrary input string without duplication. However, for the purposes of caching, you can usually get away with a solution that's simple "good enough" and reduces collisions to an acceptable level.

For example, you can simply use MD5, which will only produce a collision in 1 in 2128 cases. If you're still worried about collisions (and you probably should be, just to be safe) you can store the query and the result in the "value" of the cache, and check when you get the value back that it's actually the query you were looking for.

As a quick example (my PHP is kind of rusty, but hopefully you get the idea):

$query = "SELECT * FROM ...";

$key = "hash-" + hash("md5", $query);
$result = $cache->load($key);
if ($result == null || $result[0] != $query) {
    // object wasn't in cache, do the real fetch and store it
    $result = $db->execute($query); // etc

    $result = array($query, $result);
    $cache->save($result, $key);
}

// the result is now in $result[1] (the original query is in $result[0])
Dean Harding
Thanx! I'm trying that right now. Two questions. I thought hashing should produce the same result when given the same input string over and over. Is this not correct? What length will the md5 hash be, because i think the queries themselves were rejected as id's by the operating system due to length of the resulting filenames? Thank you though, i am trying it, but there'll be a couple of places to refactor so it's taking a while. I knew i had to cache the query along with the result just couldn't figure out how!
Joey
I've figured out the answers to both questions in the previous comment and i've posted the answer below. But i'm still not sure about collisions with md5 hash, someone please explain that to me.
Joey
@Joey: Using MD5 with the same string will always produce the same output, but the problem is that there is a 1 in 2^128 chance that two *different* strings will also produce the same output. So it's possible (though unlikely) that two different queries would hash to the same MD5 key. That's why I added the extra check in there: to ensure that doesn't happen.
Dean Harding
A: 

MD5!!

Md5 generates a string of length 32 that seems to be working fine, the cache files are created (with filenames about of length 47) so it seems as though the operating system doesn't reject them. //returns id for a given query function getCacheId($query) { return md5($query); } And that's it! But there's that issuse of collisions and i think salting the md5 hash (maybe with the name of the table) should make it more robust. //returns id for a given query function getCacheId($query, $table) { return md5($table . $query); } If anyone wants the full code for how i've implemented the results caching, just leave a comment and i'll be happy to post it.

Joey
I don't know why my code appeared that way, I apologize.
Joey