views:

978

answers:

5

Hello,

I have faced a quite strange thing related to stdext hashmap. I have to work with a lot of objects and it is a priority to access the elements in a fast way. My program read the object values from a file and if it's a new element, then insert this value in a hashmap, if it is an already processed object, then changes the stored value in hashmap.

My problem is related to hashmap(stdext). I haven't found any initialisation option for this container. The key element is an unsigned integer (uint64), and and object is stored in the hashmap with this key, with the size of 160 KBytes. The program is working, but I have to wait too much, when the number of objects in hashmap reaches a limit. After this the hashmap is working well again, as I expect it. I thought maybe it is a reorganising step.

But these steps are critical, because after a certain number of objects it takes 5 hours for this step to be done, while a normal processing step is about 2-3 minutes. After this the processing becomes "normal".

Has anyone faced such problems? Does anybody know something deeper about this hashmap? I haven't found relevant anything related to this topic. I would be very grateful if someone could help me.

Best wishes,

tamás

+3  A: 

Copying your object is probably very expensive (given its size). Try storing a pointer to the object instead of the entire thing, or a boost::shared_ptr if you want to make deletion simple.

This way, when the data structure reorganizes itself, copying is very fast, since it is just a pointer assignment, instead of whatever was required to copy your huge object.

Greg Rogers
A: 

Sounds like you are getting hit by the cost of copying your objects when the container decides to allocate room for more objects. The two simplest options are:

  1. If you know the number of objects you will be inserting using the void resize(size_type n) function.

  2. You could store pointers in the container rather than the objects themselves.

Andrew Khosravian
The resize function sounds good, but the compiler doeasn't like it, because "'resize' : is not a member of 'stdext::hash_map<_Kty,_Ty>'". Did you mean so, that I should use it in the following way: for example: hashmap0.resize(100)
Dinkumware's hash_map implementation doesn't have a resize member. Not sure if TR1's unordered_map will - Andrew might have been referring to that?IIRC you can construct it with a struct for one of it's template parameters that describes its initial size.
Peter
A: 

Your problem is that of either reallocation (and copy) or hash collision.

Vectors suffer from this too much (and has an exponentially increasing allocation size). However, maps, sets and hashmaps typically are less affected. These latter classes do not have a resize() member in most cases. So,

  • You can pass a custom allocator (that most containers allow AFAIK)
  • Choose a better hashing algorithm

You can, if you so wish, check with your implementation of hashmap to see if they follow the move-idiom. If they don't give it a shot.

dirkgently
A: 

I am trying to use the hashmap parameters with non-default values: the bucket_size and min_buckets. The default values for this are bucket_size=4 and min_buckets=8. I have changed them in xhash file to bigger values, because I didn't manage to change these values from code. I think that min_buckets is critical in my application, I try to "finetune" to get a better performance avoiding the reorganising step.

But so I get another problem, everything works fine until I try to clear the hashmap. It takes a lot of time. When I use it with the default values runs very fast.

Was it a bad step to change the xhash file? Has anybody used non-defaultvalues before? What are the reasons for this slow clear?

My second question is related to storing pointers in hashmap. The idea is clear, but how could I manage to free the pointed memory. I should create pointers to my objects; these pointers are stored in hashmap and when I need the value I can have it dereferencing this pointer. But how could I clear the memory after I saved the map? Maybe it is trivial question, but now I don't see the solution.

Thanks for your already posted answers.

+1  A: 

Use Java. You will be shocked by order by which it beats C++ when it comes to efficient hash map inserts (better allocators, no copying) and matches it on lookups.