tags:

views:

971

answers:

7

I'm implementing a custom GetHashCode for the System.Drawing.Point class in C#. My method currently fails the following requirement:

var hashA = MyGetHashCode(new Point(1, 0));
var hashB = MyGetHashCode(new Point(0, 1));
var hashC = MyGetHashCode(new Point(0, 0));
var hashD = MyGetHashCode(new Point(1, 1));
Assert.AreNotEqual(hashA ^ hashB, hashC ^ hashD);

To pass this test I'm sure that using new SHA256Managed().ComputeHash(currentHash) will do. But there is any other faster hashing algorithm? I know SHA256 is all about security, and I don't need that.

+4  A: 

A simple hash? how about something like:

 (17 * point.X) + (23 * point.Y);

Or for more obvious entropy:

int hash = -1047578147;
hash = (hash * -1521134295) + point.X;
hash = (hash * -1521134295) + point.Y;

(numbers from C#'s anonymous type code)

Marc Gravell
Marc, this will surely fulfill the Assert but is not bounded (large X or Y will overflow)... if you allow overflow then it doesn't have a good distribution
Jorge Córdoba
It will wrap (unchecked); the distribution is, AFAIK, fine... This is the approach used by the C# compiler ;-p
Marc Gravell
Where do you get this constants? BTW Jorge is right.
Jader Dias
Constants R Us.
Will
@Will I didn't get it
Jader Dias
Those numbers are just (as stated) the numbers that the C# compiler will use for an anon type of the form: new {X = 123, Y = 456}
Marc Gravell
+1  A: 

I know this isn't going to answer your question, but I must mention for the sake of other readers that you are changing the default behavior of a built in method of the framework. As per the documentation:
http://msdn.microsoft.com/en-us/library/system.object.gethashcode.aspx

The default implementation of the GetHashCode method does not guarantee unique return values for different objects. Furthermore, the .NET Framework does not guarantee the default implementation of the GetHashCode method, and the value it returns will be the same between different versions of the .NET Framework. Consequently, the default implementation of this method must not be used as a unique object identifier for hashing purposes.

Joel Martinez
Could you please use a blockquote, not a code block (prefix the quote with ">" instead of indenting)? It'll make it much easier to read.
Samir Talwar
That is no problem in this case. What they mean is that you should not use the default GetHashcode as a unique value, because different objects can return the same hash. When Jader implements it as a (practically) unique value (by using sha256 or whatever) it will not break anything. It will be just slow ...
tanascius
To be fair, it will be quite slow if he actually has a big hash table of points -- slow enough that it's pretty ridiculous.
mquander
Yeah sure - it is not really clever and he should never use a sha256 or something similar for this purpose. But the statement by Joel is not right, nevertheless.
tanascius
+3  A: 
  • Why are you doing this? Surely System.Drawing.Point has a fine hashing function already?

  • You understand that test doesn't represent a strict requirement, right? Hash codes don't have to be unique.

  • If you really want a really good hash of the coordinates in question, you might want to start at this page about hashing multiple integers.

mquander
Re "fine hashing function" - x ^ y... that isn't great; it means that anything on the diagonal is zero, and anything symmetric, i.e (5,7) and (7,5) - is equal.
Marc Gravell
It's not great, but unless you have a fairly pathological distribution of points, it's OK. I have the feeling the OP isn't working off any concrete performance requirement, if he's considering using an SHA hash, so I doubt anything better is required.
mquander
I answered the first question in the comments of the question.
Jader Dias
+1  A: 

Here's an interesting article about hashing speed:

A Hash Function for Hash Table Lookup

Dana Holt
+1  A: 

A simple Elf hash implementation (it's in delphi, shoudl be easy to translate)

function ElfHash(id : string; tableSize : integer) : integer;
var
  i : integer;
  h,x : longint;
begin
  h := 0;
  // Obtener el valor numérico
  for i := 1 to Length(id) do
  begin
    h := (h shl 4) + Ord(id[i]);

    x := h and $F0000000;
    if x <;>; 0 then
       h = h xor (x shr 24) xor x;
  end;
  // Ajustar al tamaño de la tabla
  result := h mod tableSize;
end;
Jorge Córdoba
I thought I knew delphi, but I have no idea what <;>; means
Jader Dias
:) Talk to the stackoverflow sanitizer code... That's ovbiously "<>"
Jorge Córdoba
A: 

I don't know what your application is, but you may be looking for Zobrist hashing.

http://en.wikipedia.org/wiki/Zobrist_hashing

It can be updated incrementally, which makes it very fast.

Fantius
A: 

If you know in advance that your point value is between 0 and N, you can use hashcode = X+Y*N; This is a rather obvious possible hash. It's not random at all, has ugly repetition, and is generally pretty silly. It's equivalent to concatenating the bits of your two points, assuming N is a power of 2. And it's got perfect uniform distribution and no collisions.

I've used this strategy to excellent effect in the past, but admit it has some real (but obvious) limitations. The biggest one being what happens when N is large enough that N^2 won't fit into your hash value (i.e. painful collisions.

Brian
My current implementation is ((x << 16) | (x >> 16)) ^ y (in C#), that fits in your description
Jader Dias