tags:

views:

217

answers:

3

I have a boost::unordered_map, but it appears to be in order, giving me an overwhelming feeling of "You're Doing It Wrong". Why is the output to this in order? I would've expected the underlying hashing algorithm to have randomized this order:

#include <iostream>
#include <boost/unordered_map.hpp>

int main()
{
    boost::unordered_map<int, int> im;

    for(int i = 0; i < 50; ++i)
    {
        im.insert(std::make_pair(i, i));
    }

    boost::unordered_map<int, int>::const_iterator i;

    for(i = im.begin(); i != im.end(); ++i)
    {
        std::cout << i->first << ", " << i->second << std::endl;
    }

    return 0;
}

...gives me...

0, 0
1, 1
2, 2
...
47, 47
48, 48
49, 49

Upon examination of boost's source code:

inline std::size_t hash_value(int v)
{
    return static_cast<std::size_t>(v);
}

...which would explain it. The answers below hold the higher level thinking, as well, which I found useful.

+12  A: 

While I can't speak to the boost internals as I'm not a C++ guy, I can propose a few higher-level questions that may alleviate your concerns:

1) What are the guarantees of an "unordered" map? Say you have an ordered map, and you want to create a map that does not guarantee ordering. An initial implementation may simply use the ordered map. It's almost never a problem to provide stronger guarantees than you advertise.

2) A hash function is something that hashes X -> int. If you already have an integer, you could use the identity function. While it may not be the most efficient in all cases, it could explain the behavior you're seeing.

Basically, seeing behavior like this is not necessarily a problem.

Steven Schlansker
I was not expecting the use of an identity function for a hash function, but it would seem that this is exactly what boost is doing. I suppose without domain knowledge of the inputs, such a hash is going to work about as well as any.
Thanatos
@Thanatos - since the result of the hash function is a `size_t` (at least for `boost::unordered_map`) and an `int` will always fit into a `size_t`, there's no reason to do anything but an identity function to hash an `int`.
Michael Burr
@Michael Burr - I wasn't concerned about the semantics of the language - I'm looking at the purpose of a hash function being to prevent collisions between input, so that the hash table has a chance of being O(1). I'm used to seeing rather random hash function outputs.
Thanatos
@Thantos: but if the property to be hashed is smaller than or equal in size to the hash, then there's no reason to perform any transformation on the property to produce a perfect hash - regardless of language. The identify function produces a nice, simple perfect hash in those cases.
Michael Burr
+8  A: 

It is probably because your hashes are small integers. Hash tables usually calculate the number of bucket in which to put the item like this: bucket_index = hash%p where p is a prime number, which is the number of hashtable buckets, which is large enough to provide low frequency of collisions.

For integers hash equals to the value of the integer. You have a lot of data, so hashtable selects a large p. For any p larger than i, bucket_index = i%p = i.

When iterating, the hashtable returns items from its buckets in order of their indexes, which for you is the order of keys. :)

Try using larger numbers if you want to see some randomness.

Rotsor
+1  A: 

You're doing it right. unordered_map doesn't claim to have random order. In fact, it makes no claims about order whatsoever. You shouldn't expect anything whatsoever in terms of order, and that goes for disorder!

John