views:

416

answers:

2

What is a fast hash function available for the iPhone to hash web urls (images)?

I'd like to store the cached web image as a file with a hash as the filename, because I suppose the raw web url could contain strange characters that could cause problems on the file system.

The hash function doesn't need to be cryptographic, but it definitely needs to be fast.

Example:

Input: http://www.calumetphoto.com/files/iccprofiles/icc-test-image.jpg

Output: 3573ed9c4d3a5b093355b2d8a1468509

This was done by using MD5(), but since I don't know much about that topic I don't know if it is overkill (-> slow).

+1  A: 

I think the NSObject already has a hash function. And NSUrl or NSString can override it, can you try with those things. I think in most of cases, it is fast enough, like we put NSString into NSDictionary:) NSObject hash

vodkhang
As a return value I get an Integer value, which I can convert to a string and use it as a filename. However, is that "strong" enough to differentiate between the many different urls out there? How likely is it that two different urls result in the same hash?
znq
I can not find any documents on google for that :(. But I think if you have a small number of urls (10 - 100, I just guessed), it can be ok. But I found out that usually people use MD5 to generate the hash, so performance may not be a big problem with MD5
vodkhang
Thanks. I actually checked the time to execute both, and MD5 is pretty much the same as [myObject hash]
znq
+3  A: 

MD5 may be broken for security purposes, but it works well for the situation you describe. Here's a thread on how to implement it on iPhone. Check out Vroomtrap's post. For posterity, here's my own version of that code:

- (NSString *)MD5Hash {
    const char *cStr = [self UTF8String];
    unsigned char result[CC_MD5_DIGEST_LENGTH];

    CC_MD5( cStr, strlen(cStr), result );

    return [NSString stringWithFormat: @"%02X%02X%02X%02X%02X%02X%02X%02X%02X%02X%02X%02X%02X%02X%02X%02X",
        result[0], result[1], result[2], result[3], result[4], result[5], result[6], result[7],
        result[8], result[9], result[10], result[11], result[12], result[13], result[14], result[15] ];
}

You'll need to import the CommonCrypto/CommonDigest.h header.

warrenm
I found this one here very helpful: http://www.saobart.com/md5-has-in-objective-c/
znq
I'd recommend using `dataUsingEncoding:` instead of `UTF8String`. `strlen` is not cheap, since it has to walk the entire string to find the end of it to know how long it is. The NSData object knows how long the data is.
Peter Hosey
You're welsome to do so. My empirical testing showed that using dataUsingEncoding performed the same as the above method on strings of moderate length (200K) and substantially worse on large strings (2M).
warrenm