ansaurus

Question

Difference in performance between map and unordered_map in c++

Answer 1

+7 A:

Insertion for unordered_map should be O(1) and reterival should be roughly O(1), (its essentially a hash-table).

Your timings as a result are way OFF, or there is something WRONG with your implementation or usage of unordered_map.

You need to provide some more information, and possibly how you are using the container.

As per section 6.3 of n1836 the complexities for insertion/retreival are given:

http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2005/n1836.pdf

One issue you should consider is that your implementation may need to continually be rehashing the structure, as you say you have 100mil+ items. In that case when instantiating the container, if you have a rough idea about how many "unique" elements will be inserted into the container, you can pass that in as a parameter to the constructor and the container will be instantiated accordingly with a bucket-table of appropriate size.

Beh Tou Cheh 2010-02-28 06:10:46

yes from my dict experience in python a hash table should alway be faster compared to a binary tree based map, yet at least for insertion i am finding map to be faster than unordered_map.

2010-02-28 06:23:54

ya its possible that rehashing would be leading to significant increase in time for insertions, since i am not providing any hint about possible number of elements.

2010-02-28 06:45:27

Answer 2

A:

unordered_map (at least in most implementations) gives fast retrieval, but relatively poor insertion speed compared to map. A tree is generally at its best when the data is randomly ordered, and at its worst when the data is ordered (you constantly insert at one end of the tree, increasing the frequency of re-balancing).

Given that it's ~10 million total entries, you could just allocate a large enough array, and get really fast lookups -- assuming enough physical memory that it didn't cause thrashing, but that's not a huge amount of memory by modern standards.

Edit: yes, a vector is basically a dynamic array.

Edit2: The code you've added some some problems. Your while (! LabelFile.eof() ) is broken. You normally want to do something like while (LabelFile >> inputdata) instead. You're also reading the data somewhat inefficiently -- what you apparently expecting is two numbers separated by a tab. That being the case, I'd write the loop something like:

while (LabelFile >> node >> label)
    Label[node] = label;

Jerry Coffin 2010-02-28 06:12:32

The Problem is that i am hoping to extend implementation to handle possibly around billion entries.

2010-02-28 06:16:11

It is going to handle networks with billion+ nodes. The map contains Label for each node in the network, the code will be implemented on hadoop in streaming mode.

2010-02-28 06:19:15

@Mitch:yes, that's exactly what I said. @akshayubha: the question isn't really the number of entries, but the density of the keys. If it's a billion keys running from 1 to 1 billion, an array will be fine. If it's a billion keys that are (say) 128 bits apiece, an array won't work at all.

Jerry Coffin 2010-02-28 06:26:05

ansaurus

tags:

views:

answers:

Difference in performance between map and unordered_map in c++

related questions