views:

751

answers:

3

I'm writing a simple C++ class in which I would like cache picture thumbnails versions of images downloaded from the web. As such, I would like to use a hash function which takes in URL strings and outputs a unique string suitable as a filename.

Is there a simple way to do this without re-writing the function myself? I searched around for a simple library, but couldn't find anything. Surely this is a common problem.

+2  A: 

A simpler approach is to replace everything which is not a character or a number with an underscore.

EDIT: Here's a naive implementation in C:

#include <cctype>

char *safe_url(const char *str) {
    char *safe = strdup(str);
    for (int i = 0; i < strlen(str); i++) {
        if (isalpha(str[i]))
            safe[i] = str[i];
        else
            safe[i] = '_';
    }
}
JesperE
That seems too likely to result in collisions.
erickson
You may also run into maximum file lengths
Eclipse
This also can collide over case-insensivity in OS X.
Darius Bacon
A: 

What about boost::hash?

John at CashCommons
+3  A: 

In a similar situation I encoded the key's bytes in hex (where, in your case, the key is the hash of the URL). This doubles the size but is simple, avoids any possible problems with your filesystem mangling the characters, and sorts in the same order as the original key.

(Originally I tried a slightly fancier, more efficient encoding, which I thought escaped any problematic characters, but OS X's filesystem turns out to be crazier than I assumed.)

Darius Bacon