views:

208

answers:

5

I am very confused by the name 'unordered_map'. The name suggests that the keys are not ordered at all. But I always thought they are ordered by their hash value. Or is that wrong (because the name implies that they are not ordered)?

Or to put it different: Is this

typedef map<K, V, HashComp<K> > HashMap;

with

template<typename T>
struct HashComp {
    bool operator<(const T& v1, const T& v2) const {
        return hash<T>()(v1) < hash<T>()(v2);
    }
};

the same as

typedef unordered_map<K, V> HashMap;

? (Edit: Well, ok not exactly, STL will complain here because there may be keys k1,k2 and neither k1 < k2 nor k2 < k1. You would need to use multimap and overwrite the equal-check.)

Or again differently: When I iterate through them, can I assume that the key-list is ordered by their hash value?

+1  A: 

If you want an analogy, look at the RDBMS of your choice.

If you don't specify an ORDER BY clause when performing a query, the results are returned "unordered" - that is, in whatever order the database feels like. The order is not specified, and the system is free to "order" them however it likes in order to get the best performance.

Anon.
Are they really unordered? Would they not come out ordered by the hash value?
Albert
I don't like that analogy, because in unordered_map the order is not some obscure internal detail, but is actually the consequence of the hash algorithm. In fact *if you have an optimal hash function, the number of operations performed during lookup, insertion, and removal of an arbitrary element does not depend on the number of elements in the sequence* (http://tiny.cc/vqm58)
Lorenzo
+1  A: 

You are right, unordered_map is actually hash ordered. Note that most current implementations (pre TR1) call it hash_map.

The IBM C/C++ compiler documentation remarks that if you have an optimal hash function, the number of operations performed during lookup, insertion, and removal of an arbitrary element does not depend on the number of elements in the sequence, so this mean that the order is not so unordered...

Now, what does it mean that it is hash ordered? As an hash should be unpredictable, by definition you can't take any assumption about the order of the elements in the map. This is the reason why it has been renamed in TR1: the old name suggested an order. Now we know that an order is actually used, but you can disregard it as it is unpredictable.

Lorenzo
Eh, why was this downvoted? That seemed to me so far the most correct answer. Isn't it? Please those who don't think it is, add some comments.
Albert
See the other answers. A very common implementation orders the keys by `hash(Key) % NumberOfBuckets`, which is definitely not the same as ordering by `hash(Key)`. One of the important consequences is that the order can change if more elements are inserted and the number of buckets grows. If you'd incorrectly assume it was hash-ordered, the order would not change if you add more elements.
MSalters
@MSalters: that's why I wrote you have not to rely on any hash order as it is unpredictable.
Lorenzo
It may be unpredictable, but - if things were as you describe them - at least the order would be stable. That means that if `a` precedes `b` now, it will always do so. Reality and the standard disagree. This is especially relevant for algorithms that insert or remove elements while iterating through a container.
MSalters
+3  A: 

"Unordered" doesn't mean that there isn't a linear sequence somewhere in the implementation. It means "you can't assume anything about the order of these elements".

For example, people often assume that entries will come out of a hash map in the same order they were put in. But they don't, because the entries are unordered.

As for "ordered by their hash value": hash values are generally taken from the full range of integers, but hash maps don't have 2**32 slots in them. The hash value's range will be reduced to the number of slots by taking it modulo the number of slots. Further, as you add entries to a hash map, it might change size to accommodate the new values. This can cause all the previous entries to be re-placed, changing their order.

In an unordered data structure, you can't assume anything about the order of the entries.

Ned Batchelder
I thought I can assume that they come out ordered by their hash value.
Albert
I've added more...
Ned Batchelder
Yes sure but still they would be ordered by their hash value. Of course if the hash value is the same for different keys, the order is undefined.
Albert
They aren't 'ordered by their hash value'. All that depends entirely on the implementation. If it is open hashing they are ordered by their hash value modulo a prime if they didn't cause a collision, and by their hash value modulo a prime modulo another prime if there was. If it is a chaining/bucket implementation like most are, the buckets are ordered on hash value modulo the number of buckets, and the items in the list are ordered probably on insertion time. You can't call any of that 'ordered on hash value', or indeed 'ordered' in any useful sense at all.
EJP
@Albert: If the hash map has 1009 slots (a prime number), and your two values hash to 13116 and 13118, they will end up in slots 1008 (13116 % 1009) and 1 (13118 % 1009). When iterating the entries, the second one will come out first. You can't assume anything about the ordering in an unordered map.
Ned Batchelder
Thanks for this example. Thanks also EJP! That really clears it up to me.
Albert
+2  A: 

As the name unordered_map suggests, no ordering is specified by the C++0x standard. An unordered_map's apparent ordering will be dependent on whatever is convenient for the actual implementation.

anon
Why is that so? Isn't it obvious to order by hash value?
Albert
@Albert Nothing says an unordered_map must use hashing. And in fact when collisions are taken into account, the order of an unordered_map is not predictable from a hash function.
anon
@Albert: it's so to let the implementors decide the best ordering that fits their implementation. unordered_map doesn't *guarantee* any order, you don't rely on it, the implementors decide the best order (if any) to deliver the best performance; end of the story. It's in the spirit of C++ standard to require the bare minimum and avoid useless constraints to let the implementors provide the best performance they can.
Matteo Italia
Who/why downvoted this? For me, it's the most standard-conforming answer.
Matteo Italia
@Matteo: I prefer Dean's answer, I actually learned something useful reading it.
Wizard
Ok, but this answer makes you know on what you can rely and what's the rationale behind it: you cannot rely on it being ordered in any way, it's like that to let the implementors do their work at best. Talking about the internals of the implementations *is* interesting, but as an end-user of unordered_map what you should rely on is just what the standard says.
Matteo Italia
@Neil: sorry, I forgot it in the last comment; it was a reply to @Wizard's comment.
Matteo Italia
+4  A: 

In answer to your edited question, no those two snippets are not equivalent at all. std::map stores nodes in a tree structure, unordered_map stores them in a hashtable*.

Keys are not stored in order of their "hash value" because they're not stored in any order at all. They are instead stored in "buckets" where each bucket corresponds to a range of hash values. Basically, the implementation goes like this:

function add_value(object key, object value) {
   int hash = key.getHash();

   int bucket_index = hash % NUM_BUCKETS;
   if (buckets[bucket_index] == null) {
       buckets[bucket_index] = new linked_list();
   }
   buckets[bucket_index].add(new key_value(key, value));
}

function get_value(object key) {
   int hash = key.getHash();

   int bucket_index = hash % NUM_BUCKETS;
   if (buckets[bucket_index] == null) {
       return null;
   }

   foreach(key_value kv in buckets[bucket_index]) {
       if (kv.key == key) {
           return kv.value;
       }
   }
}

Obviously that's a serious simplification and real implementation would be much more advanced (for example, supporting resizing the buckets array, maybe using a tree structure instead of linked list for the buckets, and so on), but that should give an idea of how you can't get back the values in any particular order. See wikipedia for more information.


* Technically, the internal implementation of std::map and unordered_map are implemented-defined, but the standard requires certain Big-O complexity for operations that implies those internal implementations

Dean Harding
By far the best answer.
Lorenzo
Thanks a lot. That really clears it up. I always thought that a hashtable would be implemented internally using a tree structure (just as a map from hash values to buckets). Seems I was terribly wrong there.
Albert
This was downvoted again by at least someone. What is all this downvoting about here? Can those people who downvote sth please give some comments?
Albert
@Albert: I don't know why, but rather than just linking to Wikipedia, I've added some pseuodocode for a simple implementation of a hashtable that should show how it's not possible to get any meaningful "order". I know you've already got the gist, but just in case anybody else comes along :-)
Dean Harding
Thanks for the pseudocode. I am starting to get it. This gets me now to the new question: http://stackoverflow.com/questions/3176761/usual-hashtable-implementation-compared-to-tree
Albert
Precision: the big-O complexity for `map` does not impose a binary tree (and certainly not a red-black tree). There are a number of structures that could fit here. AVL trees, Splay trees and Skip Lists among them (the last not being a tree, as the name implies).
Matthieu M.