Best way to solve a distance problem

views:

771

answers:

+6 Q:

Best way to solve a distance problem

There are a set of point that are collinear. The problem is to add a new point that lies in the same line so that sum of distance from the new point to existing points is minimum. (Assume that the point lies in a horizontal line).

The solution I thought of is:

Sort the points according to their x-coordinates (y does not matter anyway).
If the no.of points is odd, place the new point at same coordinate as middle one.
Otherwise, place the point at midpoint of n/2 and n/2 + 1 points.

I cannot prove that above method works. Is it correct? Also any better way of solving this?

Well, you certainly don't need to sort them. Just take the average of their x and y coordinates. This will work in N-dimensions as long as they are collinear.

EDIT: I realized that I'm computing the mean. You are computing the median, as stated in another answer, which is probably more likely to get the minimum distance to all points.

EDIT2: The median is the correct answer for an odd number of points. For an even number of points, it's any point along the innermost line segment defined by the area with the same number of points on both sides.

Proof (ish): You found a correct answer, but for even number of points there are muliple.

For any two points, any point on the line segment between those points will have the sum of its distances to both points be the same. Any point outside that line segment will have its distances greater than on the line segment.

So, to find the point with the smallest distance from all collinear points, you need to decompose the problem into contained sets of two points, i.e., line segments that are fully contained in other line segments. Then, just take the smallest line segment (or point in the case of odd number of points) and pick a point on that line segment. All points on that line segment will have the minimum distance from all points for that particular configuration.

If you want to graph this, all distance from point graphs will have the same shape: \/ For a line segment, the distance graph will look like this: _/

In essence, you want to add all distance graphs and find the minimum.

MSN 2009-02-05 06:28:30

+1 A:

Following is not the best solution. But gives the correct value.

First find the angle of your line, rotate all points to reverse of that angle to become a horizontal line
Sort all X points, since Y will be constant after rotation
Let it be X(1), X(2), X(3), ... X(N)
Then store the Computed distance D(R) for every R from 1 to N as [(2R-N)*X(R)] - [X(1)+X(2)+X(3)..+X(R)] + [X(R+1)+X(R+2)+X(R+3)...+X(N)]
Get minimun D(R).
Rotate back X(R), Y to original angle.
That is your expected value.
If incase D(R) & D(R+1) are same, then all rotated points inbetween X(R),Y & X(R+1), Y will be your expected value.
Interstingly if R is mid, then answer will be minimum since number of additions [X(1)+..X(R)] and [X(R+1)+..X(N)] are almost equal then difference is minimum, otherwise if incase number of additons in one side is higher, then always difference will be higher than number of equal additons. Similarly if there are even number of point, All points in between (N)/2 to (N/2)+1 will have same equal distance..
Hence MEDIAN is correct answer.

Hope this should work for you.

lakshmanaraj 2009-02-05 07:03:37

Interesting problem!

Edit: The following is wrong, but i don't understand where my mistake is.

The median solution you explained in your question is an incorrect solution.

Since you want to prove correctness, i'd like to try solve this problem as follows:

First of all, since all points are collinear, we can easily split them into their X- and Y-component. We then solve the problem independently for X and Y. Let's assume we got the values V[0] to V[n-1] with n being the number of points.

Now the problem becomes computing the x so that SUM( (V[0 ... n-1] - x)^2 ) becomes minimal. We build the derivative 2*n*x - 2*SUM( V[0 ... n-1] ).

This becomes 0 for -n * x + SUM( V[0 ... n-1] ). Accordingly, x = SUM( V[0 ... n-1] ) / n.

So just add all values and divide them by n to get the correct minimum solution. After doing this for both x and y, you get the desired point.

This also equals the assumption i made when first thinking about your problem and it works for a few values i tested. Hope this helps. :)

mafutrct 2009-02-05 07:26:20

The mistake is that you're finding the point that minimizes the sum of squared distances, while the question asks for minimizing the sum of distances. [Also, it is entirely unnecessary to split X and Y. Coordinate axes are arbitrary; you can think of the line as an axis, and work in 1 dimension.]

ShreevatsaR 2009-02-05 14:06:03

Regarding the split, i think it is necessary in the special case of all points being perpendicular to the used axis.

mafutrct 2009-02-05 15:00:44

No, it is never necessary. Axes are arbitrary. The line is a valid choice of axis.

ShreevatsaR 2009-02-06 00:24:15

The line of the points itself is certainly a correct axis. I referred to using an axis perpendicular to the line of points (stupid, but might happen). This way, all values become the same and sorting them would not make any sense. By splitting, this will be avoided automatically.

mafutrct 2009-02-06 08:34:09

Let me try explaining again. Points might be given to you in terms of coordinates. But they are just points in space. And to prove this theorem about them, you can *throw away* the coordinates you are given, and just work with the points on the line. What axis you use to think with is *your* choice.

ShreevatsaR 2009-02-09 22:44:20

+1 A:

This is solved by finding the median. See http://en.wikipedia.org/wiki/Geometric_median ("Properties" section). It would be sufficient to perform the calculations on either the x or y coordinates. Either can be used as long as the coordinates along the selected axis are not constant for the line.

norheim.se 2009-02-05 08:03:38

+7 A:

I believe that the median is the correct answer if we wish to minimise the sum of the absolute distances, which is the obvious interpretation. The mean is correct if we wish to minimise the sum of the squared distances. For points at y = 0 and x = 0, 1, 2, 101 the mean is 26 and we can take the median to be 1.5. The sum of absolute distances from the mean is 149 and the sum of absolute distances from the median is 102.

At the median the number of points to your left is the same as the number of points to your right. Moving left by a small amount increases all the distances to the points to your right and decreases all distances to the points to your left by the same amount - no change. If you are a point or more away from the median you can decrease the sum of absolute distances by moving towards the area where there are more points. This decreases the sum of the distances orginationg from the area where there are more points by more than it increases the sum of the distances originating from the area where there are fewer points - So if you are not at the median you can improve things by moving towards it. This is a fairly standard result in statistics.

2009-02-05 08:10:32

If you have two points, any point in between them will have the same distance sum. Therefore, the median is not the only correct answer.

MSN 2009-02-10 16:50:06

+3 A:

I think you can prove it by induction. I'll do the odd ones, and you can expand it:

Without loss of generality, we can say that the points lie along the line y = 0, and that the center point is at (0, 0). This is because affine transforms like rotations and translation do not affect relative distances.

Let the set of points on the line be defined as P = { (x, 0) <= x is real } Define the distance from point X as sum( P => |P - X| )

Lemma 1: The center point must be along the line y = 0. Assume that the center point is at (x, y) with y != 0. Consider the point (x, 0).

sum(P - (x,y)) = sum( sqrt( (p-x)*(p-x) + (0-y)*(0-y) ) )
               = sum( sqrt( p*p - 2xp + x*x + y*y ) )
               > sum( sqrt( p*p - 2xp + x*x + (0-0)*(0-0) ) )
               = sum(P - (x,0))

This is a contradiction, so y = 0 must be true.

Base case of 1 element: It is an odd number of elements, so choose it: (0, 0). Assume that there is a point X = (x, 0) such that x is closer. Then this means that |x - 0| < (0 - 0), or that |x| < 0, which is impossible. Therefore (0, 0) is the center point.

Base case of 3 elements: It is an odd number of elements, so choose the middle point: (0, 0). Without loss of generality, let the other two points be (a<0, 0) and (b>0, 0). Assume that there is a point X = (x, 0) that is closer. Then this means that:

|x - 0| + |x - a| + |x - b| < |0 - 0| + |0 - a| + |0 - b|

<=>

|x| + |x - a| + |x - b| < |a| + |b|

However:

|x| + |x - a| + |x - b| >= |x| + |a| + |b| >= |a| + |b|, which contradicts the assumption, so therefore (0, 0) is the center point.

Case with N elements (N odd). Assume that all odd sets of points satisfy the conditions above. Let P be the set with N elements, and arrange them like so:

{ (a, 0), Q={set of N-2 elements, with center at (0, 0)}, (b, 0) }

Assume that the center point is X = (x, 0).

sum(P - X) = |x-a| + |x-b| + sum(Q - X)
           > |x-a| + |x-b| + sum(Q - (0,0))
           >= |a| + |b| + sum(Q - (0,0))
           = sum(P - (0,0))

Which means that the assumption is contradicted, so (0,0) must be the center point.

That proves it for all of the odd numbers. The even numbers should be similar.

FryGuy 2009-02-05 10:18:41

Above, mcdowella, FryGuy, and ShreevatsaR had ideas in my response below.

Let n be the number of points on the line. Let the points be p(1), ..., p(n), labeled from left to right.

Case n = 1. True.

Case n = 2. True. You can choose any point between the two points.

Case n >= 3. Introduce an x-y coordinate system. Rotate the plane so that the points are on the x-axis.

The point minimizing the distance to the other points must be in the interval [p(1), p(n)], because of the following reasoning: If it were to the left of p(1), move it right a little (small enough so that x doesn't hit p(1)), decreasing the distance. Similarly, if it were to the right of p(n).

Pick any point x on the line, not necessarily one of p(1), ..., p(n).

Let D be the sum of the distances from x to the other points.

Let L be the number of points to the left of x. Similarly, let R be the number of points to the right.

There are three subcases.

Subcase 1: x is the median. So L = R. If we move x a small amount e to the left, the distance sum becomes D - Le + Re + e = D + e > D. Similarly, for moving right. So the median gives a local minimum.

Subcase 2: x is to the left of the median. Similar to the following subcase.

Subcase 3: x is to the right of the median. So L >= R. There are two points such that x is in (p(i), p(i + 1)] (the left endpoint is excluded, but the right is included). Let e = x - p(i).

Move x to the left by e. We can have x become p(i). The distance sum becomes D - Le + Re = D - (L - R)e <= D. That is, we possibly decrease D. We don't increase it.

We keep moving x to the left, possibly decreasing the distance sum, and not increasing it, until x becomes the median. So the median gives a global minimum.

2009-02-08 22:25:14

ansaurus

tags:

views:

answers:

Best way to solve a distance problem

related questions