views:

69

answers:

2

I need to generate a fast hash code in GetHashCode for a BitArray. I have a Dictionary where the keys are BitArrays, and all the BitArrays are of the same length.

Does anyone know of a fast way to generate a good hash from a variable number of bits, as in this scenario?

UPDATE:

The approach I originally took was to access the internal array of ints directly through reflection (speed is more important than encapsulation in this case), then XOR those values. The XOR approach seems to work well i.e. my 'Equals' method isn't called excessively when searching in the Dictionary:

    public int GetHashCode(BitArray array)
    {
        int hash = 0;
        foreach (int value in array.GetInternalValues())
        {
            hash ^= value;
        }
        return hash;
    }

However, the approach suggested by Mark Byers and seen elsewhere on StackOverflow was slightly better (16570 Equals calls vs 16608 for the XOR for my test data). Note that this approach fixes a bug in the previous one where bits beyond the end of the bit array could affect the hash value. This could happen if the bit array was reduced in length.

    public int GetHashCode(BitArray array)
    {
        UInt32 hash = 17;
        int bitsRemaining = array.Length;
        foreach (int value in array.GetInternalValues())
        {
            UInt32 cleanValue = (UInt32)value;
            if (bitsRemaining < 32)
            {
                //clear any bits that are beyond the end of the array
                int bitsToWipe = 32 - bitsRemaining;
                cleanValue <<= bitsToWipe;
                cleanValue >>= bitsToWipe;
            }

            hash = hash * 23 + cleanValue;
            bitsRemaining -= 32;
        }
        return (int)hash;
    }

The GetInternalValues extension method is implemented like this:

public static class BitArrayExtensions
{
    static FieldInfo _internalArrayGetter = GetInternalArrayGetter();

    static FieldInfo GetInternalArrayGetter()
    {
        return typeof(BitArray).GetField("m_array", BindingFlags.NonPublic | BindingFlags.Instance);
    }

    static int[] GetInternalArray(BitArray array)
    {
        return (int[])_internalArrayGetter.GetValue(array);
    }

    public static IEnumerable<int> GetInternalValues(this BitArray array)
    {
        return GetInternalArray(array);
    }

... more extension methods
}

Any suggestions for improvement are welcome!

+1  A: 

If the bit arrays are 32 bits or shorter then you just need to convert them to 32 bit integers (padding with zero bits if necessary).

If they can be longer then you can either convert them to a series of 32-bit integers and XOR them, or better: use the algorithm described in Effective Java.

public int GetHashCode()
{
    int hash = 17;
    hash = hash * 23 + field1.GetHashCode();
    hash = hash * 23 + field2.GetHashCode();
    hash = hash * 23 + field3.GetHashCode();
    return hash;
}

Taken from here. The field1, field2 correcpond the the first 32 bits, second 32 bits, etc.

Mark Byers
I have seen your approach mentioned elsewhere, but I don't really understand the theory behind it or the selection of the 'magic' primes. This approach was slightly more effective than the XOR approach I originally took (16570 Equals calls vs 16608 for the XOR for my test data). See my edit for more detail.
bart
+2  A: 

It is a terrible class to act as a key in a Dictionary. The only reasonable way to implement GetHashCode() is by using its CopyTo() method to copy the bits into a byte[]. That's not great, it creates a ton of garbage.

Beg, steal or borrow to use a BitVector32 instead. It has a good implementation for GetHashCode(). If you've got more than 32 bits then consider spinning your own class so you can get to the underlying array without having to copy.

Hans Passant
I need more than 32 bits. I was considering writing my own class (with some help from Reflector), but it seems a shame to not take advantage of the built in BitArray. A little reflection hacking got me the internal array, which of course could change in future versions of the framework - e.g. a 64 bit version may be more efficient on 64 bit hardware. I'm happy with that solution for now, though.
bart