views:

1002

answers:

5

A recent talk about unordered_map in C++ made me realize, that I should use unordered_map for most cases where I used map before, because of the efficiency of lookup ( amortized O(1) vs. O(log n) ). Most times I use a map I use either int's or std::strings as keys, hence I've got no problems with the definition of the hash function. The more I thought about it, the more I came to realize that I can't find any reason of using a std::map in case of simple types over a unordered_map -- I took a look at the interfaces, and didn't find any significant differences that would impact my code.

Hence the question - is there any real reason to use std::map over unordered map in case of simple types like int and std::string?

I'm asking from a strictly programming point of view -- I know that it's not fully considered standard, and that it may pose problems with porting.

Also I expect that one of the correct answers might be "it's more efficient for smaller sets of data" because of a smaller overhead (is that true?) -- hence I'd like to restrict the question to cases where the amount of keys is non-trivial (>1 024).

Edit: duh, I forgot the obvious (thanks GMan!) -- yes, map's are ordered of course -- I know that, and am looking for other reasons.

+7  A: 

Don't forget the map's keep their elements ordered. If you can't give up that, obviously you can't use an unordered_map.

Something else to keep in mind is that unordered_map's generally use more memory. A map just has a few house-keeping pointers then memory for each object. Contrarily, unordered_map's have a big array (these can get quite big in some implementations) and then additional memory for each object. If you need to be memory-aware, a map should prove better, because it lacks the large array.

So, if you need pure lookup-retrieval, I'd say an unordered_map is the way to go. But there are always trade-offs, and if you can't afford them, then you can't use it.

Just from personal experience, I found an enormous improvement in performance (measured, of course) when using an unordered_map isntead of a map in a main entity look-up table.

On the other hand, I found it was much slower at repeatedly inserting and removing elements. It's great for a relatively static collection of elements, but if you're doing tons of insertions and deletions the hashing + bucketing seems to add up. (Note, this was over many iterations.)

GMan
+1: yes, forgot the obvious ordered property :), and the memory tip is something I wasn't aware of -- thanks
Kornel Kisielewicz
One more thing about the large(r) memory block property of unordered_map vs. map (or vector vs list) , the default process heap (talking Windows here) is serialized. Allocating (small) blocks in large quantities in a multithreaded application is very expensive.
RA
RA: You can somewhat control that with your own allocator type combined with any container, if you think it matters for any particular program.
Roger Pate
+3  A: 

Hash tables have higher constants than common map implementations, which become significant for small containers. Max size is 10, 100, or maybe even 1,000 or more? Constants are the same as ever, but O(log n) is close to O(k). (Remember logarithmic complexity is still really good.)

What makes a good hash function depends on your data's characteristics; so if I don't plan on looking at a custom hash function (but can certainly change my mind later, and easily since I typedef damn near everything) and even though defaults are chosen to perform decently for many data sources, I find the ordered nature of map to be enough of a help initially that I still default to map rather than a hash table in that case.

Plus that way you don't have to even think about writing a hash function for other (usually UDT) types, and just write op< (which you want anyway).

Roger Pate
@Roger, do you know the approximate amount of elements at which unordered_map bests map? I'll probably write a test for it though, anyway... (+1)
Kornel Kisielewicz
@Kornel: It doesn't take very many; my tests were with about 10,000 elements. If we want a *really* accurate graph, you could look at an implementation of `map` and one of `unordered_map`, with certain platform and certain cache size, and do a complex analysis. :P
GMan
Depends on implementation details, compile-time tuning parameters (easy to support if you're writing your own implementation), and even the specific machine used for the tests. Just like for the other containers, the committee only sets the broad requirements.
Roger Pate
+3  A: 

I'd echo roughly the same point GMan made: depending on the type of use, std::map can be (and often is) faster than std::tr1::unordered_map (using the implementation included in VS 2008 SP1).

There are a few complicating factors to keep in mind. For example, in std::map, you're comparing keys, which means you only ever look at enough of the beginning of a key to distinguish between the right and left sub-branches of the tree. In my experience, nearly the only time you look at an entire key is if you're using something like int that you can compare in a single instruction. With a more typical key type like std::string, you often compare only a few characters or so.

A decent hash function, by contrast, always looks at the entire key. IOW, even if the table lookup is constant complexity, the hash itself has roughly linear complexity (though on the length of the key, not the number of items). With long strings as keys, an std::map might finish a search before an unordered_map would even start its search.

Second, while there are several methods of resizing hash tables, most of them are pretty slow -- to the point that unless lookups are considerably more frequent than insertions and deletions, std::map will often be faster than std::unordered_map.

Of course, as I mentioned in the comment on your previous question, you can also use a table of trees. This has both advantages and disadvantages. On one hand, it limits the worst case to that of a tree. It also allows fast insertion and deletion, because (at least when I've done it) I've used a fixed-size of table. Eliminating all table resizing allows you to keep your hash table a lot simpler and typically faster.

Edit: Oops, I almost forgot to mention one other point: the requirements for hashing and tree-based maps are different. Hashing obviously requires a hash function, and an equality comparison, where ordered maps require a less-than comparison. Of course the hybrid I mentioned requires both. Of course, for the common case of using a string as the key, this isn't really a problem, but some types of keys suit ordering better than hashing (or vice versa).

Jerry Coffin
+1: the hash resizing and eager string comparison for longer strings is quite a valid point
Kornel Kisielewicz
Hash resizing can be dampen down by `dynamic hashing` technics, which consist in having a transition period where each time you insert an item, you also rehash `k` other items. Of course, it means that during the transition you have to search 2 different tables...
Matthieu M.
+3  A: 

I would just point out that... there are many kind of unordered_maps.

Look up the Wikipedia Article on hash map. Depending on which implementation was used, the characteristics in term of look-up, insertion and deletion might vary quite significantly.

And that's what worries me the most with the addition of unordered_map to the STL: they will have to choose a particular implementation as I doubt they'll go down the Policy road, and so we will be stuck with an implementation for the average use and nothing for the other cases...

For example some hash maps have linear rehashing, where instead of rehashing the whole hash map at once, a portion is rehash at each insertion, which helps amortizing the cost.

Another example: some hash maps use a simple list of nodes for a bucket, others use a map, others don't use nodes but find the nearest slot and lastly some will use a list of nodes but reorder it so that the last accessed element is at the front (like a caching thing).

So at the moment I tend to prefer the std::map or perhaps a loki::AssocVector (for frozen data sets).

Don't get me wrong, I'd like to use the std::unordered_map and I may in the future, but it's difficult to "trust" the portability of such a container when you think of all the ways of implementing it and the various performances that result of this.

Matthieu M.
+1: valid point -- life was easier when I was using my own implementation -- at least I knew **where** it sucked :>
Kornel Kisielewicz
A: 

If you want to compare the speed of your std::map and std::unordered_map implementations, you could use Google's sparsehash project which has a time_hash_map program to time them. For example, with gcc 4.4.2 on an x86_64 Linux system

$ ./time_hash_map
TR1 UNORDERED_MAP (4 byte objects, 10000000 iterations):
map_grow              126.1 ns  (27427396 hashes, 40000000 copies)  290.9 MB
map_predict/grow       67.4 ns  (10000000 hashes, 40000000 copies)  232.8 MB
map_replace            22.3 ns  (37427396 hashes, 40000000 copies)
map_fetch              16.3 ns  (37427396 hashes, 40000000 copies)
map_fetch_empty         9.8 ns  (10000000 hashes,        0 copies)
map_remove             49.1 ns  (37427396 hashes, 40000000 copies)
map_toggle             86.1 ns  (20000000 hashes, 40000000 copies)

STANDARD MAP (4 byte objects, 10000000 iterations):
map_grow              225.3 ns  (       0 hashes, 20000000 copies)  462.4 MB
map_predict/grow      225.1 ns  (       0 hashes, 20000000 copies)  462.6 MB
map_replace           151.2 ns  (       0 hashes, 20000000 copies)
map_fetch             156.0 ns  (       0 hashes, 20000000 copies)
map_fetch_empty         1.4 ns  (       0 hashes,        0 copies)
map_remove            141.0 ns  (       0 hashes, 20000000 copies)
map_toggle             67.3 ns  (       0 hashes, 20000000 copies)
Blair Zajac