tags:

views:

324

answers:

4

I am wondering about the hash quality and the hash stability produced by the String.GetHashCode() implementation in .NET?

Concerning the quality, I am focusing on algorithmic aspects (hence, the quality of the hash as it impacts large hash-tables, not for security concerns).

Then, concerning the stability, I wondering about the potential versionning issues that might arise from one .NET version to the next.

Some lights on those two aspects would be very appreciated.

+3  A: 

I can't give you any details about the quality (though I would assume it is pretty good given that string is one of the framework's core classes that is likely to be used as a hash key).

However, regarding the stability, the hash code produced on different versions of the framework is not guaranteed to be the same, and it has changed in the past, so you absolutely must not rely on the hash code being stable between versions (see here for a reference that it changed between 1.1 and 2.0). In fact, it even differs between the 32-bit and 64-bit versions of the same framework version; from the docs:

The value returned by GetHashCode is platform-dependent. For a specific string value, it differs on the 32-bit and 64-bit versions of the .NET Framework.

Greg Beech
A: 

The quality of the hash codes are good enough for their intended purpose, i.e. they doesn't cause too many collisions when you use strings as key in a dictionary. I suspect that it will only use the entire string for calculating the hash code if the string length is reasonably short, for huge strings it will brobably only use the first part.

There is no guarantee for stability across versions. The documentation clearly says that the hashing algorithm may change from one version to the next, so that the hash codes are for short term use.

Guffa
+1  A: 

I just came across a related problem to this. On one of my computers (a 64 bit one) I had a problem that I tracked down to 2 different objects being identical except for the (stored) hashcode. That hashcode was created from a string....the same string!

m_storedhash = astring.GetHashCode();

I dont know how these two objects ended up with different hash codes given they were from the same string however I suspect what happened is that within the same .NET exe, one of the class library projects I depend upon has been set to x86 and another to ANYCPU and one of these objects was created in a method inside the x86 class lib and the other object (same input data, same everything) was created in a method inside the ANYCPU class library.

So, does this sound plausible: Within the same executable in memory (not between processes) some of the code could be running with the x86 Framework's string.GetHashCode() and other code x64 Framework's string.GetHashCode() ?

freddy smith
A: 

I know that this isn't really included the meanings of quality and stability that you specified, but it's worth being aware that hashing extremely large strings can produce an OutOfMemoryException.

https://connect.microsoft.com/VisualStudio/feedback/details/517457/stringcomparers-gethashcode-string-throws-outofmemoryexception-with-plenty-of-ram-available

Matt -The Hat- McVitie