hi I want to use a hashmap for words in the dictionary and the indices of the words in the dicionary.
What would be the fastest hash algorithm for this?
Thanks!
hi I want to use a hashmap for words in the dictionary and the indices of the words in the dicionary.
What would be the fastest hash algorithm for this?
Thanks!
There are many different hashing algorithms, of varying efficiency, but the most important issue is that it scatter the items fairly uniformly across the different hash buckets.
However, you may as well assume that the Microsoft engineers/library engineers have done a decent job of writing an efficient and effective hash algorithm, and just using the built-in libraries/classes.
At the bottom of this page there is a section A Note on Hash Functions with some information which you might find useful.
For convenience, I'll just replicate some links here:
The fastest hash function will be
template <class T>
size_t hash(T key) {
return 0;
}
however, though the hashing will be mighty fast, you will suffer performance elsewhere. What you want is to try several hashing algorithms on actual data and see which one actually gives you the best performance in aggregate on the actual data you expect to use if the hashing or lookup is even a performance bottleneck. Until then, go with something handy. MD5 is pretty widely available.
Have you tried just using the STL hash_map and seeing if it serves your needs before rolling anything more complex?
boost has a hash function that you can reuse for your own data (predefined for common types). That'd probably work well & fast enough if your needs aren't special.
What is your use case? A radix search tree (trie) might be more suitable than a hash if you're mapping from string to integer. Tries have the advantage of reducing key comparisons for variable length keys. (e.g., strings)
Even a binary search tree (e.g., STL's map) might be superior to a hash based container in terms of memory use and number of key comparisons. A hash is more efficient only if you have very few collisions.