ansaurus

Question

Generating a unique id for a given string using php

Answer 1

+1 A:

Strings are arbitrarily long, so obviously it's impossible to create a fixed-size identifier that can represent any arbitrary input string without duplication. However, for the purposes of caching, you can usually get away with a solution that's simple "good enough" and reduces collisions to an acceptable level.

For example, you can simply use MD5, which will only produce a collision in 1 in 2¹²⁸ cases. If you're still worried about collisions (and you probably should be, just to be safe) you can store the query and the result in the "value" of the cache, and check when you get the value back that it's actually the query you were looking for.

As a quick example (my PHP is kind of rusty, but hopefully you get the idea):

$query = "SELECT * FROM ...";

$key = "hash-" + hash("md5", $query);
$result = $cache->load($key);
if ($result == null || $result[0] != $query) {
    // object wasn't in cache, do the real fetch and store it
    $result = $db->execute($query); // etc

    $result = array($query, $result);
    $cache->save($result, $key);
}

// the result is now in $result[1] (the original query is in $result[0])

Dean Harding 2010-06-21 01:28:23

Thanx! I'm trying that right now. Two questions. I thought hashing should produce the same result when given the same input string over and over. Is this not correct? What length will the md5 hash be, because i think the queries themselves were rejected as id's by the operating system due to length of the resulting filenames? Thank you though, i am trying it, but there'll be a couple of places to refactor so it's taking a while. I knew i had to cache the query along with the result just couldn't figure out how!

Joey 2010-06-21 02:16:05

I've figured out the answers to both questions in the previous comment and i've posted the answer below. But i'm still not sure about collisions with md5 hash, someone please explain that to me.

Joey 2010-06-21 02:57:08

@Joey: Using MD5 with the same string will always produce the same output, but the problem is that there is a 1 in 2^128 chance that two *different* strings will also produce the same output. So it's possible (though unlikely) that two different queries would hash to the same MD5 key. That's why I added the extra check in there: to ensure that doesn't happen.

Dean Harding 2010-06-21 04:13:27

Answer 2

A:

MD5!!

Md5 generates a string of length 32 that seems to be working fine, the cache files are created (with filenames about of length 47) so it seems as though the operating system doesn't reject them. //returns id for a given query function getCacheId($query) { return md5($query); } And that's it! But there's that issuse of collisions and i think salting the md5 hash (maybe with the name of the table) should make it more robust. //returns id for a given query function getCacheId($query, $table) { return md5($table . $query); } If anyone wants the full code for how i've implemented the results caching, just leave a comment and i'll be happy to post it.

Joey 2010-06-21 02:52:40

I don't know why my code appeared that way, I apologize.

Joey 2010-06-21 02:54:22

ansaurus

tags:

views:

answers:

Generating a unique id for a given string using php

related questions