views:

431

answers:

2

Guys, I am using dynamic programming approach to solve a problem. Here is a brief overview of the approach

  1. Each value generated is identified using 25 unique keys.
  2. I use the boost::hash_combine to generate the seed for the hash table using these 25 keys.
  3. I store the values in a hash table declared as

    boost::unordered_map<Key_Object, Data_Object, HashFunction> hashState;

  4. I did a time profiling on my algorithm and found that nearly 95% of the run time is spent towards retrieving/inserting data into the hash table.

  5. These were the details of my hash table

    hashState.size() 1880

    hashState.load_factor() 0.610588

    hashState.bucket_count() 3079

    hashState.max_size() 805306456

    hashState.max_load_factor() 1

    hashState.max_bucket_count() 805306457

I have the following two questions

  1. Is there anything which I can do to improve the performance of the Hash Table's insert/retrieve operations?

  2. C++ STL has hash_multimap which would also suit my requirement. How does boost libraries unordered_map compare with hash_multimap in terms of insert/retrieve performance.

A: 

If your hash function is not the culprit, the best you can do is probably using a different map implementation. Since your keys are quite large, using unordered_map from Boost.Intrusive library might be the best option. Alternatively, you could try closed hashing: Google SparseHash or MCT, though profiling is certainly needed because closed hashing is recommended when elements are small enough. (SparseHash is more established and well tested, but MCT doesn't need those set_empty()/set_deleted() methods).

EDIT:

I just noticed there is no intrusive map in the Boost library I mentioned, only set and multiset. Still, you can try the two closed hashing libraries.

EDIT 2:

STL hash_map is not standard, it is probably some extension and not portable across compilers.

doublep
A: 

Are you sure that the hash function you are using is not the bottleneck? Which time percentage takes the hash function? Can you do the same test and replace the insert/retrievals by a simple call to the hash.

Vicente Botet Escriba
The hash function which generates the seed value takes 60% of the computation time. It does a boost::hash_combine on 25 key values. I tried using a simple hash key, but it takes a very long time to retrieve the value. FYI, I am looking at inserting and retrieving > 100,000 values into the hash table
Amrish
Which one spent less time, boost::hash_combine on 25 key values or simple hash?
Vicente Botet Escriba
For the simple hash, the time spent on hashing was a mere 1% on the entire run time of the algorithm. But the algorithm as such took a very long time to run. And I believe that it was because of the more time spent in retrieving/inserting values as a result of too many collisions!
Amrish