tags:

views:

213

answers:

4

is there any function in C++ that calculates a fingerprint or hash of a string that's guaranteed to be at least 64 bits wide?

I'd like to replace my unordered_map<string, int> with unordered_map<long long, int>.

Given the answers that I'm getting (thanks Stack Overflow community...) the technique that I'm describing is not well-known. The reason that I want an unordered map of fingerprints instead of strings is for space and speed. The second map does not have to store strings and when doing the lookup, it doesn't incur any extra cache misses to fetch those strings. The only downside is the slight chance of a collision. That's why the key has to be 64 bits: a probability of 2^(-64) is basically an impossibility. Of course, this is predicated on a good hash function, which is exactly what my question is seeking.

Thanks again, Stack Overflowers.

+1  A: 

What exactly is it that you seek to achieve? Your map is not gonna work any better with a "bigger" hash function. Not notably, anyway.

n3rd
The hash table will be much smaller because it doesn't have to store the strings.
Neil G
Also, it will be much faster to do a lookup since it won't incur the cache misses of loading the string within the hash table.
Neil G
(Also, this is not an "answer".)
Neil G
+2  A: 

unordered_map always hashes the key into a size_t variable. This is independent from the type of the key and depends solely on the architecture you are working with.

ebo
I noticed that when I looked at the documentation. However, if size_t is not guaranteed to be 64 bits or more, then I will still have to store the strings because I can't be fairly certain that collisions won't happen.
Neil G
It's too bad that `size_t` is not a template parameter of `std::hash`.
Neil G
+2  A: 

If you want to map any string to a unique integer:

typedef std::map<string,long long> Strings;
static Strings s_strings;
long long s_highWaterMark = 0;
long long my_function(const string& s)
{
  Strings::const_iterator it = s_strings.find(s);
  if (it != s_strings.end())
  {
    //we've previously returned a fingerprint for this string
    //now return the same fingerprint again
    return it->second;
  }
  //else new fingerprint
  long long rc = ++s_highWaterMark;
  //... remember it for next time
  s_strings.insert(Strings::value_type(s, rc));
  //... and return it this time
  return rc;
}
ChrisW
Nice solution. It's too bad there isn't a built-in hash function. The whole point of this is to optimize for space and speed, and the additional look-up is going to be costly in terms of cache misses compared to calculating a hash value in place.
Neil G
I think you should look for a hash function that produces a unique mapping. Instead look for a hash function that produces a usually-unique mapping, and do something else your map (e.g. linear search) in the possible-but-rare event that there's a collision.
ChrisW
+2  A: 

c++ has no native 128 Bit type, nor does it have native hashing support. Such extensions for hashing are supposed to be added in TR1, but as far as I am aware 128 bit ints aren't supported my many compilers. (Microsoft supports an __int128 type -- only on x64 platforms though)

I'd expect the functions included with unordered_map would be faster in any case.

If you really do want to do things that way, MD5 provides a good 128 bit hash.

Billy ONeal
Thanks, this is great information.
Neil G
marked as the answer for MD5.
Neil G