views:

163

answers:

2

This question is similar to the one here.

We all know what PointF is, don't we? This is the data structure:

public struct PointF
{
  public float X;
  public float Y;
}

How to implement IEqualityComparer<PointF> with tolerance? Let's say my Equals code is like this

public const float Epsilon = 0.01; //say
public bool Equals(PointF pt1, PointF pt2)
{
   return Math.Abs(pt1.X-pt2.X)<Epsilon && Math.Abs(pt1.Y-pt2.Y)<Epsilon;
}

Question: How to implement the correct GetHashCode so that for a dictionary of PointF, I will access the element correctly?

I crack my head a few days but still can't find a satisfactory solution.

+6  A: 

Instead of defining the tolerance by the distance, you could place the points in a grid.
If two points are in the same cell, they're considered equal and have the same hash code.

public bool Equals(PointF pt1, PointF pt2)
{
   return GetCell(pt1.X) == GetCell(pt2.X)
       && GetCell(pt1.Y) == GetCell(pt2.Y);
}

public int GetHashCode(PointF pt)
{
   return GetCell(pt.X) ^ GetCell(pt.Y);
}

private static int GetCell(float f)
{
    return (int)(f / 10); // cell size is 10 pixels
}

Thesis: There is no implementation of Equals and GetHashCode that meets your requirements.

Proof: Consider the following three points, A, B, and C:

Illustration

As per your requirements,

Equals(A, B) == true              // (i)
Equals(B, C) == true              // (ii)
Equals(A, C) == false             // (iii)
GetHashCode(A) == GetHashCode(B)  // (iv)
GetHashCode(B) == GetHashCode(C)  // (v)
GetHashCode(A) != GetHashCode(C)  // (vi)

But from (iv) and (v) follows

GetHashCode(A) == GetHashCode(C)

and thereby

Equals(A, C) == true

which contradicts (iii) and (vi).

Since Equals and GetHashCode cannot return different values for the same arguments, there is no implementation that meets your requirements. q.e.d.

dtb
This is a good reformulation +1.
Vinko Vrsalovic
This is a *no-no*, let's assume that two points are in different grids but they are indefinitely close to each other, they are already not the same by your definition, whereas by any other definition they are the same.
Ngu Soon Hui
@Ngu That's a no-no in your particular use case, there are plenty of valid use cases for grids.
Vinko Vrsalovic
+1 Probably the only easily implemented solution that would give predictable results. When comparing a couple of points this and original (author) way can give different output (different cells, points close to two neighbor cell border while tolerance distance is acceptable) but in large quantity of points dtb solution would be deterministic.
Audrius
If you have multiple sets of points, then give them a container that tracks the origin for the grid, that way you can adjust for the different coordinate spaces (and indeed tolerances) so that this approach actually works.The only other approach you have is to deep scan the whole world each time to find those within the tolerance. One optimisation from that would then to bisect groups of points into squares or polygonal regions based on locality.
Andras Zoltan
loving the graphics there dtb :)
Andras Zoltan
Thanks for the proof
Ngu Soon Hui
Your proposition (vi) is incorrect... there's no requirement for unequal objects to have different hash-codes. A valid implementation of `GetHashCode` would be `return 0`!
Ben Lings
@Ben Lings: Good catch. However, the point of implementing GetHashCode is to be able to use the data type as key in a dictionary in order to get (almost) constant lookup complexity. If GetHashCode returns the same hash code for all values (and there is no other valid implementation), lookup complexity would become linear, which defeats the purpose of using a dictionary. You could equally use a list of key-value-pairs then instead and wouldn't need a GetHashCode implementation at all. So, while you're technically right, your insight does not open up any new way to solve the problem. :-)
dtb
A: 

I don't think it's possible because you could have an infinite sequence of values that are equal (within tolerance) to the previous and next value in the sequence but not any other value and GetHashCode would need to return an identical value for all of them.

Pent Ploompuu