tags:

views:

1102

answers:

8

Hi,

Is the return value of GetHashCode() guaranteed to be consistent assuming the same string value is being used? (C#/ASP.NET)

I uploaded my code to a server today and to my surprise I had to reindex some data because my server (win2008 64-bit) was returning different values compared to my desktop computer.

+16  A: 

If I'm not mistaken, GetHashCode is consistent given the same value, but it is NOT guaranteed to be consistent across different versions of the framework.

From the MSDN docs on String.GetHashCode():

The behavior of GetHashCode is dependent on its implementation, which might change from one version of the common language runtime to another. A reason why this might happen is to improve the performance of GetHashCode.

Jonas
+1 for providing an answer to the bug I was just working on... :)
NM
A: 

I wonder if there are differences between 32-bit and 64-bit operating systems, because I am certain both my server and home computer are running the same version of .NET

I was always weary of using GetHashCode(), it might be a good idea for me to simply role my own hash algorithm. Well at least I ended up writing a quick re-index .aspx page because of it.

public static
A: 

Are you running Win2008 x86 as your desktop? Because Win2008 includes version 2.0.50727.1434, which is an updated version of 2.0 included in Vista RTM.

Mark Brackett
A: 

Not a direct answer to your question, which Jonas has answered well, however this may be of assistance if you are worried about equality testing in hashes

From our tests, depending on what you are requiring with hashcodes, in C#, hashcodes do not need to be unique for Equality operations. As an example, consider the following:

We had a requirement to overload the equals operator, and therefore the GetHashCode function of our objects as they had become volatile and stateless, and sourcing themselves directly from data, so in one place of the application we needed to ensure that an object would be viewed as equal to another object if it was sourced from the same data, not just if it was the same reference. Our unique data identifiers are Guids.

The equals operator was easy to cater for as we just checked on the Guid of the record (after checking for null).

Unfortuantely the HashCode data size (being an int) depends on the operating system, and on our 32 bit system, the hashcode would be 32 bit. Mathematically, when we override the GetHashCode function, it is impossible to generate a unique hashcode from a guid which is greater than 32 bit (look at it from the converse, how would you translate a 32 bit integer into a guid?).

We then did some tests where we took the Guid as a string and returned the HashCode of the Guid, which almost always returns a unique identifier in our tests, but not always.

What we did notice however, when an object is in a hashed collection object (a hashtable, a dictionary etc), when 2 objects are not unique but their hashcodes are, the hashcode is only used as a first option lookup, if there are non-unique hash codes being used, the equality operator is always used as a fall back to detirmine equality.

As I said this may or may not be relevant to your situation, but if it is it's a handy tip.

UPDATE

To demonstrate, we have a Hashtable:

Key:Object A (Hashcode 1), value Object A1

Key:Object B (Hashcode 1), value Object B1

Key:Object C (Hashcode 1), value Object C1

Key:Object D (Hashcode 2), value Object D1

Key:Object E (Hashcode 3), value Object E1

When I call the hashtable for the object with the key of Object A, the object A1 will be returned after 2 steps, a call for hashcode 1, then an equality check on the key object as there is not a unique key with the hashcode 1

When I call the hashtable for the object with the key of Object D, the object D1 will be returned after 1 step, a hash lookup

johnc
A: 

Hi John,

What we did notice however, when an object is in a hashed collection object (a hashtable, a dictionary etc), when 2 objects are not unique but their hashcodes are, the hashcode is only used as a first option lookup, if there are non-unique hash codes being used, the equality operator is always used as a fall back to detirmine equality.

This is the way hash lookups work, right? Each bucket contains a list of items having the same hash code.

So to find the correct item under these conditions a linear search using value equality comparison takes place.

And if your hashing implementation achieves good distribution, this search is not required, i.e., one item per bucket.

Is my understanding correct?

Ben, from our test, this is true. The second equality search only runs as required. You can test it yourself by overloading the ==, !=, Equals() and GetHashCode of a certain object. I found it very interesting (but I am a geek :) )
johnc
(continued), so the knock on effect of nonunique hash codes would be slower performance to run the equality check, but in our situation where the nonunique value is very rare, it is largely insignificant
johnc
+3  A: 

The implementation is dependent on the version of the framework but it also depends on the architecture. The implementation of string.GetHashCode() is dfferent in the x86 and x64 versions of the framework even if they have the same version number.

Jack Bolding
+4  A: 

I had a similar problem where I filled a database table with information which was dependent on String.GetHashCode (Not the best idea) and when I upgraded the server I was working on to x64 I noticed the values I was getting from String.GetHashCode were inconsistent with what was already in the table. My solution was to use my own version of GetHashCode which returns the same value as String.GetHashCode on a x86 framework.

Here's the code, don't forget to compile with "Allow unsafe code":

    /// <summary>
    /// Similar to String.GetHashCode but returns the same as the x86 version of String.GetHashCode for x64 and x86 frameworks.
    /// </summary>
    /// <param name="s"></param>
    /// <returns></returns>
    public static unsafe int GetHashCode32(string s)
    {
        fixed (char* str = s.ToCharArray())
        {
            char* chPtr = str;
            int num = 0x15051505;
            int num2 = num;
            int* numPtr = (int*)chPtr;
            for (int i = s.Length; i > 0; i -= 4)
            {
                num = (((num << 5) + num) + (num >> 0x1b)) ^ numPtr[0];
                if (i <= 2)
                {
                    break;
                }
                num2 = (((num2 << 5) + num2) + (num2 >> 0x1b)) ^ numPtr[1];
                numPtr += 2;
            }
            return (num + (num2 * 0x5d588b65));
        }
    }
A: 

I would have to Say...you cannot rely on it. For example if I run file1 through c#'s md5 hash code and copy nd paste the same file to a new directory...the hash code come out different even tough it is he same file. Obviously its the same .net version, same everything. The only thing that changed was the path.

greg