Using boost unordered map

views:

431

answers:

+1 Q:

Using boost unordered map

Guys, I am using dynamic programming approach to solve a problem. Here is a brief overview of the approach

Each value generated is identified using 25 unique keys.
I use the boost::hash_combine to generate the seed for the hash table using these 25 keys.
I store the values in a hash table declared as

boost::unordered_map<Key_Object, Data_Object, HashFunction> hashState;
I did a time profiling on my algorithm and found that nearly 95% of the run time is spent towards retrieving/inserting data into the hash table.
These were the details of my hash table

hashState.size() 1880

hashState.load_factor() 0.610588

hashState.bucket_count() 3079

hashState.max_size() 805306456

hashState.max_load_factor() 1

hashState.max_bucket_count() 805306457

I have the following two questions

Is there anything which I can do to improve the performance of the Hash Table's insert/retrieve operations?
C++ STL has hash_multimap which would also suit my requirement. How does boost libraries unordered_map compare with hash_multimap in terms of insert/retrieve performance.

If your hash function is not the culprit, the best you can do is probably using a different map implementation. Since your keys are quite large, using unordered_map from Boost.Intrusive library might be the best option. Alternatively, you could try closed hashing: Google SparseHash or MCT, though profiling is certainly needed because closed hashing is recommended when elements are small enough. (SparseHash is more established and well tested, but MCT doesn't need those set_empty()/set_deleted() methods).

EDIT:

I just noticed there is no intrusive map in the Boost library I mentioned, only set and multiset. Still, you can try the two closed hashing libraries.

EDIT 2:

STL hash_map is not standard, it is probably some extension and not portable across compilers.

doublep 2010-05-02 21:50:37

Are you sure that the hash function you are using is not the bottleneck? Which time percentage takes the hash function? Can you do the same test and replace the insert/retrievals by a simple call to the hash.

Vicente Botet Escriba 2010-05-03 21:47:03

The hash function which generates the seed value takes 60% of the computation time. It does a boost::hash_combine on 25 key values. I tried using a simple hash key, but it takes a very long time to retrieve the value. FYI, I am looking at inserting and retrieving > 100,000 values into the hash table

Amrish 2010-05-04 17:58:25

Which one spent less time, boost::hash_combine on 25 key values or simple hash?

Vicente Botet Escriba 2010-05-05 14:33:27

For the simple hash, the time spent on hashing was a mere 1% on the entire run time of the algorithm. But the algorithm as such took a very long time to run. And I believe that it was because of the more time spent in retrieving/inserting values as a result of too many collisions!

Amrish 2010-05-05 15:44:51

ansaurus

tags:

views:

answers:

Using boost unordered map

related questions