views:

306

answers:

7

I'm looking for the optimal way to compute a hashcode for a set of bi-dimensional points (so that I can store polygons in a hashtable).

There are some obvious ways to do that, such as concatenating all the points coordinates in a string and its hashcode, but this would be very slow.

On the other end of the speed/collision spectrum, I can also for example sum up all the coordinates, which would result in a very fast code, but would also create a lot of collisions.

What's the optimal way to compute a hashcode for a set of points?

Is the optimal solution different if the coordinates are integer (vs real coordinates)?

Edit : I'm using .net so the hashcode should be 32 bits long.

A: 

Optimal is dependent on your requirements from the hash computation.

Performance will come at the cost of more hash collisions.

Do you have a hard bound on either one? It's going to come down to a mathematical analysis of how much each percent of hash collisions is going to cost you in terms of performance.

Yuval A
No hard bounds. Now that I have precised that hash size is 32 bits, "optimal" means something, right?
Brann
+6  A: 

There is no optimal way for this job. It all depends on how big hash can you afford. You have to make tradoffs between speed and diffusion. Keep in mind that there is no such thing as optimal solution (if you do not exactly know what you are going to hash) In some cases xor can be good enough.

Take for instance this code

unsigned int JSHash(char* str, unsigned int len)
{
    unsigned int hash = 1315423911;
    unsigned int i    = 0;

    for(i = 0; i < len; str++, i++)
    {
        hash ^= ((hash << 5) + (*str) + (hash >> 2));
    }

    return hash;
}
/* End Of JS Hash Function */

You said that agregating points together is to slow. If you fix upper code it does not need any kind of agregation just pass trought (not much different that sums) And if you are using integeres and floats you would probably fix shifts (<< and >> are shift operations which together works like bitwise rotation) to fit your data type.

Check for other hash functions here: http://www.partow.net/programming/hashfunctions/

ralu
A: 

If your data set is by any chance one of polygons that can have common edges but not overlap otherwise, you only need to hash on three points in each polygon to avoid collisions.

Edit: Reconsidering this, picturing possible collisions with concave/convex boundaries, it is just as well your polygons overlap. - Sigh

Alas: When the convex and the concave meet, it always gets me into trouble. :-P

Anon
A: 

Alternatively, you can just XOR the hashes of the individual points.

return p1.GetHashCode() ^ p2.GetHashCode()

Depending on what the values are going to be anyway. Probably could just add them.

Noon Silk
A: 

If you want polygons that are defined clockwise and anticlockwise, but otherwise equal, to be equal, then you'll have to create a canonicalization function. A function that given a polygons points starting from any point and in any order will return the points in equal order.

One algorithm that I can think of is to find the minimum of all possible sequences of points:

  1. Find the set of top-leftmost points (points with minimum x of the points with minimum y), these are the starting points.
  2. For each starting point and each direction, iteratively add connected points in the given direction and eliminate all that aren't top-leftmost in the current iteration. Halt when only one starting point,direction pair is left or when n-1 iterations are completed. If more than one starting point and direction is remaining, choose any - they are all isomorphic.
  3. Reorder the points starting from the found point in the found direction.

This is O(n^2) worst-case for fully degenerate polygons, but if your polygons don't have overlapping points, this is O(n), with a pretty small constant factor.

With the canonicalized order you can easily compare two polygons for equality, just iteratively compare points for equality. Hashcode calculation is also trivial, use any reasonably robust hash combination method. For example:

int result = 0;
foreach (var point in this.points) {
    result = (result * 31 + point.X.GetHashCode()) * 31 + point.Y.GetHashCode();
}
Ants Aasma
A: 

For a very quick (to calculate) hash with the desired properties on clockwise/counter clockwise independence you would not want to be dependent on finding a well defined ordering of the points.

This limits your hash combining operations to ones which commute. Therefore we wish to keep any and all data which is independent of orientation separate during the combining operations.

Here is a simple solution:

Assuming a combine function int -> int -> int which is associative any of the following will do to start with:

public static int combine(int h, int x)
{
    return h * 31 + x;
} 

public static int combine(int h, int x)
{
    return h ^ x;
}

Then we can do the following:

public override int GetHashCode()
{
    int x = 0;
    int y = 0;
    uint h = 0;    
    foreach (var point p in polgon)
    {
        x = combine(x, p.X);
        y = combine(y, p.Y);
        h++;
    }
    // simplified, unrolled Murmur2 hash for end stage
    const uint m = 0x5bd1e995;
    const int r = 24;
    uint h = count;
    uint k = ReinterpretInt32ToUInt32(x);
    k *= m;
    k ^= k >> r;
    k *= m;
    h *= m;
    h ^= k;
    k = ReinterpretInt32ToUInt32(y);
    k *= m;
    k ^= k >> r;
    k *= m;
    h *= m;
    h ^= k;
    // avalanche
    h ^= h >> 13;
    h *= m;
    h ^= h >> 15;
    return ReinterpretUInt32ToInt32(h);
}

Relying on this to make the code above easy

public unsafe uint ReinterpretInt32ToUInt32(int i)
{
    return *((uint*) (void*) &i);
}

public unsafe int ReinterpretUInt32ToInt32(uint u)
{
    return *((int*) (void*) &u);
}

This will not be the best hash in terms of collision avoidance but should be very fast to calculate and you may find it sufficient for your needs.

ShuggyCoUk
would the -1 care to comment why? seems weird coming so late...
ShuggyCoUk