views:

180

answers:

7

I'm creating an application in C# 3.5 that uses the AutoCAD API to read a 2D AutoCAD drawing, make changes to the drawing using defined business logic, then adjust it back in AutoCAD. Due to the nature of the logic, the shape of the drawing has to be re-constructed - e.g. a rectangle is made up of 4 connecting straight lines.

I'm creating these shapes using the start and end co-ordinates of each line from AutoCAD, but some of the co-ordinates don't exactly match up. For example, one point could at 0.69912839 (on one axis), but a line starting from the same point could be 0.69990821. These are in mm so the distance is minute (0.00078mm!)

I've created my own class (call it MyPoint, similar to PointF) because I've needed to add some additional logic to it. In that class I've created a method that takes two doubles and returns true or false depending on if the two points are within 0.001mm of each other. I've then overridden the Equals method, == and != operators so I can do (point1 == point2 or point1.Equals(point2)) which checks if all axis are within 0.001mm of each other - if they are, I class it as being the same point.

That's fine and working brilliantly. Now, I need to check a collection of these point classes to get rid of all duplicates, so I'm using LINQ's Distinct() method on my collection. However this method uses GetHashcode(), not Equals() to determine if the instances are equal. So, I've overriden GetHashcode() which uses the GetHashcode of the double class.

But, the above example fails because obviously they're different values and therefore generate different hashcodes. Is there any way that two numbers that are within 0.001 of each other can generate the same hashcode? (Note the numbers don't know about each other as GetHashcode is called separately on different class instances.) I've tried numerous ways which work for some examples but not for others.

One example is truncating the number to 3dp (multiply it by 10^3, then truncate it) and creating the hashcode on the result - which works for the above example (699 == 699.) But this doesn't work for 0.69990821 and 0.70000120 (699 != 700.) I've tried rounding, which works for the second set of numbers (0.700 == 0.700) but not for the first (0.699 != 0.700.) I've even tried truncating the number to 3dp then adjusting it up to the next even number, which works for both the previous examples, but not for 12.9809 and 12.9818 (12980 != 12982.)

Is there another way, or should I scrap the Equals, ==, != and GetHashcode overrides, and create my own MyPoint.IsEqualTo() and MyPointCollection.Distinct() methods?

A: 

Here's some code to show what I'm doing. Each pair of numbers in "original" should return the same value.

int tolerance = 3;
double[] original = new double[] {
0.69912839,
0.69990821,

0.69990821,
0.70000120,

12.980984087,
12.981808908
};
double[] modified = new double[original.Length];

for (int i = 0; i < original.Length; i++)
{
modified[i] = original[i];

/* Begin number adjustment logic */
modified[i] *= Math.Pow(10, tolerance);
modified[i] = Math.Truncate(modified[i]);

if (modified[i] % 2 != 0)
{
modified[i]++;
}
/* End number adjustment logic */

Console.WriteLine(modified[i]);

if (i % 2 != 0)
{
Console.WriteLine(string.Empty);
}
}

That above method is the "truncate to 3dp then adjust to the nearest even" method. Next is the truncate method (replace code between the begin/end comments):

/* Begin number adjustment logic */
modified[i] *= Math.Pow(10, tolerance);
modified[i] = Math.Truncate(modified[i]);
/* End number adjustment logic */

This is the round method:

/* Begin number adjustment logic */
modified[i] = Math.Round(modified[i], tolerance);
/* End number adjustment logic */
Andy Shellam
+3  A: 

It's unable to write correct hashcode. let proof it: we have 2 points. var a = point1.GetHashCode(); var b = point2.GetHashCode();

if a!= b let create point between point1 and point2. and so on.

After such operations we will create line, where each point is near to some other point and their hashcodes will be the same. So hash code for point1 and point2 should be equals.

So overrtide like this:

public override int GetHashCode()
{
    return 0;
}

and implement you equals.

Steck
I think your conclusion is right, although I'm not sure I follow your logic. The only way to do this is to make ALL hashcodes return the same and rely on the subsequent call to Equals that will be done to determine if they are the same. This obviously means you can't use the class in a hashmap properly as they'll all end up in one bucket but may not be an issue.
Paolo
These won't be used in a hashtable or dictionary as there's nothing unique about the locations to be keyed on - all I need is a list of them and work out which ones are at the same point on one or both axis.
Andy Shellam
This worked fine but like Paolo, I didn't follow your logic, therefore I've marked fortran's answer as it made more sense to me. Thanks.
Andy Shellam
Yeah, if we have to get the same hashcode to the near points we can't give different hashcodes and using hashmap is useless.Try to use something like k-means clustering and give hashcodes to the clusters.
Steck
+1  A: 

I guess that if you return always the same hash (say 0), LinQ will try to compare all elements with equals. After all, a hash is useful to prove that two elements are distinct, not equal.

But anyway, I would recommend you to use better suited structures and algorithms for this domain, like Binary Splitting Partition (BSP) trees (for example).

fortran
Interesting - I was following the guidelines on GetHashcode implementations that state it must return the same hash if Equals() returns true, which I guess this is.
Andy Shellam
This works great, thanks!
Andy Shellam
that's the point :-) the semantics imply this syllogism: `A == B => hash(A) == hash(A)`, so if we got just the hashes we can deduce `!(hash(A) == hash(B)) => !(A == B)`, but not the complementary.
fortran
+3  A: 

I think you should not override Equals(), ==, != or GetHashCode()

If you override any of these you should make sure that their semantics do not change. In your example they do.

For instance your == cannot be transitive for it it is then if P1 is 0.001 mm from P2, P2 is 0.001 mm from P3 and P1 is 0.002 mm from P3 then P1 == P2, P2 == P3 and P1 == P3, which is not what you want. In general you end up with all points being equal to all other points.

I would just stick to using a separate method for determining if points are close enough.

EDIT

With your override of == you can now write code like this:

if(P1 == P2 && P2 == P3 && P1 != P3)
{
    // Code here gets executed
}
Peter van der Heijden
If P1 is 0.002mm from P3, then when comparing P1 to P3, it'll return false. It'll only ever be comparing 2 points and is to check - "actually, are these two points close enough to be the same?"
Andy Shellam
I have added code to try to clarify what I think is wrong with your overloaded == operator
Peter van der Heijden
I don't need to compare 3 points though. Take 2 lines - one starts at the same point the other ends. Even though the points look the same, they're actually in the region of 0.00081mm apart, for example. So I just want to look at those 2 points and say "are they within 0.001mm of each other?" If they are, then they're taken as the same point and the lines (in my app) are joined. The overloaded == operator is working great, it's just getting the same hashcode for two points I consider the same that was making things tricky.
Andy Shellam
I appreciate that what you have done satisfies your current requirements. However by creating a class that uses well known operators and methods in a manner that is not standard you run several risks. Other people will have a hard time understanding your code or using your class, you will have a hard time using your class in standard scenario's e.g. as key in a dictionary, if the code that uses your class grows in size you may run into hard to find bugs, etc.
Peter van der Heijden
+3  A: 

It would be easier just to remove the dependency on the Distinct method. Implement a System.Collections.IComparer (or the generic equivalent) and use a simple collection like a list. Then determine if the item is in the list with the comparer, and not add it if it already is contained.

MaLio
I've also marked this as an answer as it actually seems more logical - don't know why I didn't think of it before! I had to use the List's BinarySearch method to be able to use IComparer - the Contains method requires an IEqualityComparer which guess what? It uses a hashcode!! I've implemented my own Distinct() method in my collection which is what I was leaning toward doing anyway.
Andy Shellam
+1  A: 

should I scrap the Equals, ==, != and GetHashcode overrides, and create my own MyPoint.IsEqualTo() and MyPointCollection.Distinct() methods?

Yes.

It needn't necessarily be some completely different data structure, though. In checking for a duplicate, you need to check the neighboring hashcodes, e.g. the hashes for (x+0.001,y), (x,y-0.001), and so on. This makes for just a constant-factor slowdown compared to the usual deduplication lookup, and it's not complicated, so it might be the way to go. (An obvious point, but I don't see it made explicitly here yet.)

Update: To clarify, let's look at a 1-dimensional version of the problem. The 'points' are single numbers, x. We consider x1 and x2 to match when abs(x1 - x2) < .001. Now we want to find out if x matches any of {x_0, ..., x_n}. The x_i are kept in a hashtable, where hash(x) = h(floor(1000*x)) for some function h() that spreads things around. To see if x is already in the table, we compute hash(x-.001), hash(x), and hash(x+.001), then we test if x matches any of the x_i in any of the three buckets. Any matching x_i cannot be in any other bucket.

In the 2-d variant, there are 9 neighboring buckets to check (counting the middle); in 3-d, 27.

Darius Bacon
This was similar to what I was considering doing but I still needed a constant number that was the same to generate the hash on. For example, 0.69890821 and 0.7000821 (0.69990821 +/- 0.001) will still generate completely different hashes to 0.69912839 +/- 0.001 even though I must consider them to be the same point.
Andy Shellam
I think I didn't make myself clear; I'll update my answer.
Darius Bacon
+1  A: 

This should be a clearer explanation of what Steck and Paolo said.

Suppose you did manage to write the GetHashCode method in the way you want.

Then for any points a and b, whatever the distance between them, a.GetHashCode() == b.GetHashCode().

Proof: suppose a < b. Split the distance between a and b into segments smaller than 0.001. I.e. a0 = a, a1 = a0 + 0.0005, a2 = a1 + 0.0005, etc. until you get to b.

Then a.GetHashCode() == a1.GetHashCode() == a2.GetHashCode() == ... == b.GetHashCode().

Alexey Romanov