views:

143

answers:

3

Today I was discussing with another developer about a limitation in a third party library, where we couldn't use spaces in a string. The reasoning was that the strings were used as keys in a .NET Hashtable, and that searching a .NET HashTable was significantly slower when the keys contained spaces.

Now since I'm too lazy to write a test, but I still want to understand why this would be so, I ask my question here:

Is it slower to search a Hashtable when the string that is used contains a space?

I would not expect this, since before the search is performed, the hash is obtained using String.GetHashCode() and then that hash is used to locate the entry in the table.

Thanks!

+3  A: 

It shouldn't be slower. It uses GetHashCode() internally so set of characters in string doesn't matter.

This said, performance is dependent only on GetHashCode implementation for String. You may get different results for different framework versions (from MSDN):

The behavior of GetHashCode is dependent on its implementation, which might change from one version of the common language runtime to another. A reason why this might happen is to improve the performance of GetHashCode.

empi
It shouldn't be thats true, and everyone so far involved knows that. But is it? Is GetHashCode function itself slower when there are spaces in the string. It would be very strange if it were but there is only one way to know with reasonable certainty.
AnthonyWJones
Just added some information that if anyone wants to test it, should test it on more than one framework version.
empi
@Anthony: The current implementation of `GetHashCode` iterates through the string, performing some calculations on the character codes as it goes. The only way that spaces could make any difference would be if integer arithmetic was somehow slower when dealing with the number 32!
LukeH
+1  A: 

White space increases the length of the string slowing the hash function but I expect this to be really insignificant. On the other side, leaving the white spaces in the string can lead to a better hash with less collisions. So I don't think there's any problem using a string with spaces in a HashTable.

bruno conde
+5  A: 

Straight from the Rotor source, the core of the String.GetHashcode method:

                int     c;
                char *s = src;
                while ((c = s[0]) != 0) {
                    hash1 = ((hash1 << 5) + hash1) ^ c;
                    c = s[1];
                    if (c == 0)
                        break;
                    hash2 = ((hash2 << 5) + hash2) ^ c;
                    s += 2;
                }

What I can make up of this: spaces do not get any special treatment.

Conclusion:

  • The third party does not use HashTable or wraps something around to string to make spaces slower.
  • Or they are trying to obfuscate their implementation by telling stories.
GvS
Straight from the source, I like that! Thanks
michielvoo