views:

308

answers:

2

Hi, I was comparing a simple hash function that I wrote which just multiplies it by a prime mod another prime number (the table size) and it turns out that stl is slower by 100 times. This is the test method that I wrote:

stdext:: hash_map<string, int> hashDict;
for (int l = 0; l < size; ++l){
    hashDict[arr[l]] = l;
}
long int before3 = GetTickCount();
int c = 0;
while (c < size){
    hashDict[arr[c]];
    c++;
}
long int after3 = GetTickCount();
cout << "for stl class, the time is " << (after3 - before3) / 1000.0 << '\n';
cout << "the average is " << ((after3 - before3) / 1000.0 ) /long (size) << '\n';

The size of the dictionary is about 200k elements and the table size of the hash function I wrote has 3m entries, so maybe it has to do with the table size of the stl class being very small. Does anyone know what the tablesize of the stl function is and the collision rates.etc?

+2  A: 

The VS2008 STL implementation uses the following hash function for strings:

size_t _Val = 2166136261U;
while(_Begin != _End)
    _Val = 16777619U * _Val ^ (size_t)*_Begin++;

This is no less efficient than yours, certainly not 100x, and I doubt the Builder version is much different. The difference is either in measurement (GetTickCount() is not very precise) or in operations other than computing hash values.

I don't know about C++ Builder, but some STL implementations have a lot of extra checks and assertions built into the debug version. Have you tried profiling an optimized release build?

If you post a minimal but complete example we can help you figure out what is going on, but without more code there's really not much to say.

Tim Sylvester
I am guessing 216613626U is the table size and 16777619U is the prime number? What are those numbers in terms of int? And where did you find that code?
SuperString
No, those are just constants chosen to minimize collisions, they have nothing to do with the particular hash table. The final value is forced into the table size with a modulus just as in your example, but that's irrelevant in terms of performance. The code is in the function `_Hash_value` in the `<xhash>` header, but it's implementation-specific.
Tim Sylvester
so what is the table size of the stl hash_map class?
SuperString
It has 262,145 buckets after 200,000 items are inserted. It could be literally anything, though because the spec doesn't define it, it's left up to the implementation.
Tim Sylvester
A: 

To post this as an answer: I think you're getting wrong values. You're not actually using the hashed values, just referencing them, and so the compiler may well be optimizing out the accesses to your hash table and not the hash_map.

Except in extreme and rare circumstances, you aren't going to beat a professional implementation by a large integral factor while still using the same algorithm, which you are. You're at least as likely to win the big prize in the lottery than to do it by a factor of 100.

Change your test code to use the values. Sum them up and print out the sum. That's fast and forces the program to look up each value.

David Thornley