views:

232

answers:

5

It is well known that comparing floats by == is usually a mistake. In a 3D-vector class (with float components X, Y, Z) i wrote, two vectors are considered equal if their distance is considered zero.

 public override bool Equals(object obj)
 {
  if (obj == null) {
   return false;
  }

  if (GetType () != obj.GetType ()) {
   return false;
  }

  float d = DistSq ((Vec) obj);

  return IsConsideredZero (d);
 }

 public float DistSq(Vec p)
 {
  Vec d = this - p;
  return d.LengthSq ();
 }

 public float LengthSq()
 {
  return X * X + Y * Y + Z * Z;
 }

 private const float VEC_COMPARE_EPSILON_ABS = 1E-05f;
 public static bool IsConsideredZero(float f)
 {
  return Math.Abs (f) < VEC_COMPARE_EPSILON_ABS;
 }

So far, everything worked fine. However, now i'd like to get a hashcode of the vector. I can see that something like hash = (int)X^(int)Y^(int)Z is bound to fail.

The best i could come up with was:

 public override int GetHashCode()
 {
  return 0;
 }

This, of course, kind of sucks. Is there any way to get a reasonable hashcode? NaNs and other special values are possible, but unlikely, in case that is important.

+4  A: 

It's impossible assuming you want to have the normal hashcode/equality properties:

  • If X = Y and Y = Z then X = Z (transitivity)
  • If X = Y then Y = X (commutivity)
  • X = X for all X (reflexivity)

The first rule is the problem - because if each value is deemed "equal" to the next greater representable number, you end up with all numbers being equal. For instance, suppose a number is deemed equal to another they're within 0.1:

0 equals 0.08 0.08 equals 0.16 0.16 equals 0.24

=> 0 equals 0.16 by the transitivity rule => 0 equals 0.24 by the transitivity rule

(etc)

If you ignore the transitivity rule, then you still (presumably) want "equal" values to have equal hashcodes. This effectively enforces the transitivity rule - in the above example, 0 and 0.08 have to have equal hashcodes, as do 0 and 0.16. Therefore 0 and 0.16 have to have equal hashcodes, and so on. Therefore you can have no useful hashcode - it has to be a constant.

Jon Skeet
+5  A: 

I'm afraid it is not in the general case. A sketch of a proof goes like this:

Take any two numbers a and b. Let the difference between them be d. Then if you create the d/epsilon numbers with an epsilon step in between, each step must be "equal" to the step before, which by hashcode semantics have the same hashcode. So all numbers must have the same hashcode.

You can only solve this problem if you add some other constraint.

As an aside, you definition of Equals is wrong as well, as it can be true that a.Equals(b) and b.Equals(c) but not a.Equals(c), which is wrong for equals. This is known as breaking the Transitivity property.

What can I do then?

The solution depends on what you are using the hash for. One solution would be to introduce a conceptual grid. Change the equals and hashcode so two numbers are equal if in the same grid cube, by rounding to a constant number of decimal places, then taking equals and hashcode on the rounded number. If being close to zero is an important case, add a offset of epsilon/2 before rounding, so zero is the centre of the cube. This is correct, but you can have two numbers arbitrarily close together (under the limits of float) without being equal. So for some applications it will be ok, others it won't be. This is similar to an idea from mghie.

Nick Fortescue
+1 for your answer, and thanks for showing me the error of my way.
mghie
+6  A: 

I don't think you can have a hashcode that is consistent with your comparison method because the latter is not transitive: for any three vectors A, B, C, if A.Equals(B) and B.Equals(C) are true, it could still be the case that A.Equals(C) is false. (Imagine if the distance between A and B is 6e-6, between B and C is 6e-6, and between A and C is 1.2e-5) But equality of hashcodes is always transitive, since they're just numbers.

In this case, I'd just create a hashcode method that computes the hash based on the exact values of the floating-point coordinates, and mention in the documentation that it's inconsistent with equals. I know it's not really a solution but given that I don't think a real solution exists, it's better to have a nontrivial hashcode than just 0.

David Zaslavsky
+2  A: 

Everybody is correct ...

HOWEVER, one thing that is often done is to extend the concept of hash a bit. Consider a partition of your 3d space with boxes with a side >> epsilon.

The hash of a point is the box it belongs to. When you want to lookup for a point, you don't check for the point with the corresponding box (as you would do for a regular hash) but for the neighboring boxes as well. In 3d you should get away with max 8 boxes.

poulejapon
+2  A: 

Whatever technique you use will have problems because you posed something that isn't possible to solve.

What you want is 1) evenly distributed hash such that for most numbers a and b where a != b then a.GetHashCode() != b.GetHashCode() but 2) where a == b then a.GetHashCode() == b.GetHashCode() must be true.

Returning a constant fulfills (2) but not (1).

You can demonstrate that rounding at 1E-5 boundaries and using that as a hash violates fulfills (1) but violates (2). Take 1E-5 and 2E-5, for example. Rounding would produce two different hash values but they compare equal. This violates constraint (2) above. You can easily generalize this to prove that any rounding of the number will run into a similar problem.

I recommend you choose a different approach. I assume the underlying problem is determining if some point is close to a point you already have. I recommend recusively dividing the coordinate space in half (where points along the boundary (i.e. <=1E-5 from a boundary) in both halves). If you progressively divide you space (think binary tree) you can construct a data structure that will quickly return the result you want and be fairly easy to construct.

If I missed my guess and you must use a hash then can do what you want with two hash values each rounding to 1E-5 but offset by 5E-6. All equal points will compare equal on one of the two hash values. This would require you to enter point in the hash table twice, once for each hash routine.

chuckj