views:

110

answers:

5

I have a class that internally is just an array of integers. Once constructed the array never changes. I'd like to pre-compute a good hashcode so that this class can be very efficiently used as a key in a Dictionary. The length of the array is less than about 30 items, and the integers are between -1000 and 1000 in general.

A: 

Hi,

I think choosing a good hash-algorithm would have to be based on the distribution (in a probability sense) of the integer values.

Have a look at Wikipedia for a list of algorithms

S.C. Madsen
+1  A: 

Not very clever, but sufficient for most practical purposes:

EDIT: changed due to comment of Henk Holterman, thanks for that.

int hc=array.Length;
for(int i=0;i<array.Length;++i)
{
     hc=unchecked(hc*314159 +array[i]);
}
return hc;

If you need something more sophisticated, look here.

Doc Brown
Looks OK, but 314159 might be a bit large. A number like 17 or 31 would do nicely too. And: `hc=unchecked(hc*SHIFTVAL+array[i]);` to be independent of compiler settings.
Henk Holterman
Yes, one can improve that surely in many different ways, upvoted your comment.
Doc Brown
@Doc, the finer points are indeed not to relevant but I would strongly advice the `unchecked()` operator.
Henk Holterman
A: 

Any CRC (or even XOR) should be ok.

leppie
The XOR would never shift outside the -/+ 1000 window
Henk Holterman
@Henk Holterman: Sorry, I dont get it. You will still have 10 bits of valid CRC if the values are limited. Edit: Actually the rest of the bits would flip depending on sign.
leppie
CRC is OK but overkill, simply XOR-ing the values (without shifting) is not OK.
Henk Holterman
A: 

For an array of values generally between -1000 and 1000, I would probably use something like this:

static int GetHashCode(int[] values)
{
   int result = 0;
   int shift = 0;
   for (int i = 0; i < values.Length; i++)
   {
      shift = (shift + 11) % 21;
      result ^= (values[i]+1024) << shift;
   }
   return result;
}
BlueMonkMN
FYI, I chose the number 11 because 11 bits is what is necessary to store a range of 2048 distinct values (-1000 to +1000 is 2000, which is close). I chose the number 21 because 32-bit integer minus 11 bits equals 21 bits. Shifting left 21 bits will leave 11 bits to contain a value from 0 to 2048.
BlueMonkMN
+1  A: 

You may use CRC32 checksum. Here is the code:

[CLSCompliant(false)]
public class Crc32 {
    uint[] table = new uint[256];
    uint[] Table { get { return table; } }

    public Crc32() {
        MakeCrcTable();
    }
    void MakeCrcTable() {
        for (uint n = 0; n < 256; n++) {
            uint value = n;
            for (int i = 0; i < 8; i++) {
                if ((value & 1) != 0)
                    value = 0xedb88320 ^ (value >> 1);
                else
                    value = value >> 1;
            }
            Table[n] = value;
        }
    }
    public uint UpdateCrc(uint crc, byte[] buffer, int length) {
        uint result = crc;
        for (int n = 0; n < length; n++) {
            result = Table[(result ^ buffer[n]) & 0xff] ^ (result >> 8);
        }
        return result;
    }
    public uint Calculate(Stream stream) {
        long pos = stream.Position;
        const int size = 0x32000;
        byte[] buf = new byte[size];
        int bytes = 0;
        uint result = 0xffffffff;
        do {
            bytes = stream.Read(buf, 0, size);
            result = UpdateCrc(result, buf, bytes);
        }
        while (bytes == size);
        stream.Position = pos;
        return ~result;
    }
}
osprey
That seems overly complex for an array of ~30 integers from -1000 to 1000. It requires converting the array of integers into an array of bytes or a stream first because there's no function that accepts an array of integers as an input, right?
BlueMonkMN
It is easy to convert each int to byte[]:int value = 0;byte[] bytes = BitConverter.GetBytes(value);These bytes may be used to calculate checksum instead of bytes read from stream.
osprey
Yes, but you neglected the fact that you have to convert the whole array to bytes. That too is easy, but still ends up being some significant overhead in code complexity and at runtime relative to a solution specifically targeted at hashing an array of integers directly.
BlueMonkMN