views:

369

answers:

8

Given a 2 dimensional plane in which there are n points. I need to generate the equation of a line that divides the plane such that there are n/2 points on one side and n/2 points on the other. (by the way this not home work, I am just trying to solve the problem)

+1  A: 

I'd guess that a good way is to sort/sequence/order the points (e.g. from left to right), and then choose a line which passes through (or between) the middle point[s] in the sequence.

ChrisW
@chrisW if all points lie on a vertical line it doesnt work
mousey
@chrisW: What if all the points have the same x value? Also, if exactly half are on one side and half on the other, the line can't pass through any of them (maybe one, if n is odd).
BlueRaja - Danny Pflughoeft
+6  A: 
  1. Create an arbitrary line in that plane. Project each point onto that line a.k.a for each point, get the closest point on that line to that point.

  2. Order those points along the line in either direction, and choose a point on that line such that there is an equal number of points on the line in either direction.

  3. Get the line perpendicular to the first line which passes through that point. This line will have half the original points on either side.

There are some cases to avoid when doing this. Most importantly, if all the point are themselves on a single line, don't choose a perpendicular line which passes through it. In fact, choose that line itself so you don't have to worry about projecting the points. In terms of the actual mathematics behind this, vector projections will be very useful.

tlayton
Same problem as Chris' solution, multiple points could have same projection.
BlueRaja - Danny Pflughoeft
That's only an issue if those points are the two bracketing the middle. In that case, choose a different line in part 1.
tlayton
So you are suggesting "choose a random line, see if you can fit it to split the set in two, otherwise choose another random line?"
BlueRaja - Danny Pflughoeft
The final resulting line will be perpendicular to the line you choose. So yes, but the new line will give you a different result. But if you encounter this problem, it means that that particular set of points can't be split like this with a line in that particular direction.
tlayton
There is a deterministic way (though O(nlogn) currently) to find without any guessing (see my answer). Guessing is more practical though, as the chances of getting a bad guess are negligible.
Moron
+1  A: 

There are obvious cases where no solution is possible. E.g. when you have three heaps of points. One point at location A, Two points at location B, and five points at location C.

If you expect some decent distribution, you can probably get a good result with tlayton's algorithm. To select the initial line slant, you could determine the extent of the whole point set, and choose the angle of the largest diagonal.

relet
I don't find it obvious - in fact, I don't think it's true. I can't imagine a scenario where you couldn't divide C in half such that A and three points from C are in one half, B and two points from C are in the other.
BlueRaja - Danny Pflughoeft
You cannot divide C in half. Either all five points are on the line, or they are not.
relet
Oh I see what you are saying - I was under the impression all n points were distinct.
BlueRaja - Danny Pflughoeft
Heh, indeed. That needs to be clarified. I was thinking of n points as distinct objects in a list, which may refer to the same coordinates. :)
relet
A: 

I dont know how useful this is I have seen a similar problem...

If you already have the directional vector (aka the coefficients of the dimensions of your plane).

You can then find two points inside that plane, and by simply using the midpoint formula you can find the midpoint of that plane.

Then using the coefficients of that plane and the midpoint you can find a plane that is from equal distance from both points, using the general equation of a plane.

A line then would constitute in expressing one variable in terms of the other so you would find a line with equal distance between both planes.

There are different methods of doing this such as projection using the distance equation from a plane but I believe that would complicate your math a lot.

+11  A: 

I have assumed the points are distinct, otherwise there might not even be such a line.

If points are distinct, then such a line always exists and is possible to find using a deterministic O(nlogn) time algorithm.

Say the points are P1, P2, ..., P2n. Assume they are not all on the same line. If they were, then we can easily form the splitting line.

First translate the points so that all the co-ordinates (x and y) are positive.

Now suppose we magically had a point Q on the y-axis such that no line formed by those points (i.e. any infinite line Pi-Pj) passes through Q.

Now since Q does not lie within the convex hull of the points, we can easily see that we can order the points by a rotating line passing through Q. For some angle of rotation, half the points will lie on one side and the other half will lie on the other of this rotating line, or, in other words, if we consider the points being sorted by the slope of the line Pi-Q, we could pick a slope between the (median)th and (median+1)th points. This selection can be done in O(n) time by any linear time selection algorithm without any need for actually sorting the points.

Now to pick the point Q.

Say Q was (0,b).

Suppose Q was collinear with P1 (x1,y1) and P2 (x2,y2).

Then we have that

(y1-b)/x1 = (y2-b)/x2 (note we translated the points so that xi > 0).

Solving for b gives

b = (x1y2 - y1x2)/(x1-x2)

(Note, if x1 = x2, then P1 and P2 cannot be collinear with a point on the Y axis).

Consider |b|.

|b| = |x1y2 - y1x2| / |x1 -x2|

Now let the xmax be the x-coordinate of the rightmost point and ymax the co-ordinate of the topmost.

Also let D be the smallest non-zero x-coordinate difference between two points (this exists, as not all xis are same, as not all points are collinear).

Then we have that |b| <= xmax*ymax/D.

Thus, pick our point Q (0,b) to be such that |b| > b_0 = xmax*ymax/D

D can be found in O(nlogn) time.

The magnitude of b_0 can get quite large and we might have to deal with precision issues.

Of course, a better option is to pick Q randomly! With probability 1, you will find the point you need, thus making the expected running time O(n).

If we could find a way to pick Q in O(n) time (by finding some other criterion), then we can make this algorithm run in O(n) time.

Moron
I'd been racking my brain all night trying to figure this one out, but you beat me again, M. As soon as you said *"suppose we magically had a point Q on the y-axis..."* it came to me; it seems so easy now... but, too late! +1, amazing work
BlueRaja - Danny Pflughoeft
@BlueRaja: Thanks! O(n) seems so close, though...
Moron
@Moron: in 2xmax*ymax/D, the 2 isn't necessary: |x1y2 - y1x2| <= xmax * ymax because xi*yj > 0.
Nicolas Viennot
@Pafy. You are right. I have edited the answer to reflect that. Thanks!
Moron
@Moron: You wrote: "...in other words, sort by the slope of the line Pi-Q and pick a slope between the (median)th and (median+1)th points. This can be done in O(n) time by any linear time selection algorithm." If you sorted first, wouldn't picking the median be O(1)? Also, why not forgo the sort and just use the O(n) selection instead?
andand
@andand: I talked about sorting only to visualize what is going on. We don't actually need to sort, as we need only the 'middle' element.
Moron
@andand: I have edited the answer to make it clearer. Thanks!
Moron
I didn't know before that selection is in O(n). Thanks for the clarification! But with that a deterministic O(n) time algortihm is also possible (see my answer).
rudi-moore
I came up with a bit more reasonable value for b (ie. one that lies around the median y-value), but I can't seem to do it in O(n). It requires finding a lower bound for the minimum difference between two values in an unsorted list (or finding a lower-bound for the area of a triangle formed by b and any other two points), both of which I can't seem to do any better than O(n log n)
BlueRaja - Danny Pflughoeft
@BlueRaja: Min difference between two points in unsorted list might help you solve Element Distinctness problem, so O(n) is doubtful for that. Of course, the model matters, so don't know. Min difference can probably be reduced to min-area: Take all points collinear, so min-difference accounts for area (as height is same).
Moron
@M: That's disappointing to hear. I added my calculation for b anyways, though it is also `O(n log n)`
BlueRaja - Danny Pflughoeft
+1  A: 
andand
+1  A: 

I picked up the idea from Moron and andand and continued to form a deterministic O(n) algorithm.

I also assumed that the points are distinct and n is even (thought the algorithm can be changed so that uneven n with one point on the dividing line are also supported).

The algorithm tries to divide the points with a vertical line between them. This only fails if the points in the middle have the same x value. In that case the algorithm determines how many points with the same x value have to be on the left and lower site and and accordingly rotates the line.

I'll try to explain with an example. Let's asume we have 16 points on a plane. First we need to get the point with the 8th greatest x-value and the point with the 9th greatest x-value. With a selection algorithm this is possible in O(n), as pointed out in another answer. If the x-value of that points is different, we are done. We create a vertical line between that two points and that splits the points equal.

Problematically now is if the x-values are equal. So we have 3 sets of points. That on the left side (x < xa), in the middle (x = xa) and that on the right side (x > xa). The idea now is to count the points on the left side and calculate how many points from the middle needs to go there, so that half of the points are on that side. We can ignore the right side here because if we have half of the points on the left side, the over half must be on the right side.

So let's asume we have we have 3 points (c=3) on the left side, 6 in the middle and 7 on the right side (the algorithm doesn't know the count from the middle or right side, because it doesn't need it, but we could also determine it in O(n)). We need 8-3=5 points from the middle to go on the left side. The points we already got in the first step are useless now, because they are only determined by the x-value and can be any of the points in the middle.

We want the 5 points from the middle with the lowest y-value on the left side and the point with the highest y-value on the right side. Again using the selection algorithm, we get the point with the 5th greatest y-value and the point with the 6th greatest y-value. Both points will have the x-value equal to xa, else we wouldn't get to this step, because there would be a vertical line.

Now we can create the point Q in the middle of that two points. Thats one point from the resulting line. Another point is needed, so that no points from the left or right side are divided. To get that point we need the point from the left side, that has the lowest angle (bh) between the the vertical line at xa and the line determined by that point and Q. We also need that point from the right side (with angle ag). The new point R is between the point with the lower angle and a point on the vertical line (if the lower angle is on the left side a point above Q and if the lower angle is on the right side a point below Q).

The line determined by Q and R divides the points in the middle so that there are a even number of points on both sides. It doesn't divide any points on the left or right side, because if it would that point would have a lower angle and would have been choosen to calculate R.

From the view of a mathematican that should work well in O(n). For computer programs it is fairly easy to find a case where precision becomes a problem. An example with 4 points would be A(0, 100000000), B(0, 100000001), C(0, 0), D(0.0000001, 0). In this example Q would be (0, 100000000.5) and R (0.00000005, 0). Which gives B and C on the left side and A and D on the right side. But it is possible that A and B are both on the dividing line, because of rounding errors. Or maybe only one of them. So it belongs to the input values if this algorithm suits to the requirements.

plane of points example

  1. get that two points Pa(xa, ya) and Pb(xb, yb)
    which are the medians based on the x values > O(n)
  2. if xa != xb you can stop here
    because a y-axis parallel line between that two points is the result > O(1)
  3. get all points where the x value equals xa > O(n)
  4. count points with x value less than xa as c > O(n)
  5. get the lowest point Pc based on the y values from the points from 3. > O(n)
  6. get the greatest point Pd based on the y values from the points from 3. > O(n)
  7. get the (n/2-c)th greatest point Pe based on the y values from the points from 3. > O(n)
  8. also get the next greatest point Pf based on the y values from the points from 3. > O(n)
  9. create a new point Q (xa, (ye+yf)/2) between Pe and Pf > O(1)
  10. for all points Pi calculate
    the angle ai between Pc, Q and Pi and
    the angle bi between Pd, Q and Pi > O(n)
  11. get the point Pg with the lowest angle ag (with ag>0° and ag<180°) > O(n)
  12. get the point Ph with the lowest angle bh (with bh>0° and bh<180°) > O(n)
  13. if there aren't any Pg or Ph (all points have same x value)
    create a new point R (xa+1, 0) anywhere but with a different x value than xa
    else if ag is lower than bh
    create a new point R ((xc+xg)/2, (yc+yg)/2) between Pc and Pg
    else
    create a new point R ((xd+xh)/2, (yd+yh)/2) between Pd and Ph > O(1)
  14. the line determined by Q and R divides the points > O(1)
rudi-moore
@Rudi-moore: Can you please explain what you are trying to do, rather than just give the steps? It is a bit hard to read. Also, few questions, what happens if a_g = b_h? Also, is it possible that the line QR divides points to the right of points in step 3? (i.e x-cordinate > x_a) in which case you might not get an n/2-n/2 split?
Moron
@Rudi-moore: Rotating the line around Q might not work. I believe (though not completely sure) we can give you a configuration of points such that no line through Q will make an even split. You seem to be completely ignoring the points to the right of the vertical line. Also, the case when a_h = b_h and c=0 is undiscussed.
Moron
@M Sorry i'm not a native speaker, mabye that makes it a bit harder to understand. I edited the post and hope to make it clearer with an example. For your question: if a_g = b_h you can choose P_g or P_h. It doesn't matter. Both will generate the same line. It is not possible that QR divides points on the right side, if it would another R would have been choosen. I'm not completly ignoring points on the right side. Only in the first steps. c=0 is only problematically if more than n/2 points are in the middle and in that case n/2 points from the middle go to left side.
rudi-moore
@Rudi-moore: Good job! This looks right to me. It was simpler than I imagined it would be. +1.
Moron
@mousey, @BlueRaja: I suggest you take a look at this answer. I believe this works! I think this derserves to be the accepted answer.
Moron
+1  A: 

To add to M's answer: a method to generate a Q (that's not so far away) in O(n log n).

To begin with, let Q be any point on the y-axis ie. Q = (0,b) - some good choices might be (0,0) or (0, (ymax-ymin)/2).

Now check if there are two points (x1, y1), (x2, y2) collinear with Q. The line between any point and Q is y = mx + b; since b is constant, this means two points are collinear with Q if their slopes m are equal. So determine the slopes mi for all points and check if there are any duplicates: (amoritized O(n) using a hash-table)

If all the m's are distinct, we're done; we found Q, and M's algorithm above generates the line in O(n) steps.
If two points are collinear with Q, we'll move Q up just a tiny amount ε, Qnew = (0, b + ε), and show that Qnew will not be collinear with two other points.

The criterion for ε, derived below, is:

ε < mminΔ*xmin

To begin with, our m's look like this:

mi = yi/xi - b/xi

Let's find the minimum difference between any two distinct mi and call it mminΔ (O(n log n) by, for instance, sorting then comparing differences between mi and i+1 for all i)

If we fudge b up by ε, the new equation for m becomes:

mi,new = yi/xi - b/xi - ε/xi
       = mi,old - ε/xi

Since ε > 0 and xi > 0, all m's are reduced, and all are reduced by a maximum of ε/xmin. Thus, if

ε/xmin < mminΔ, ie.
ε < mminΔ*xmin

is true, then two mi which were previously unequal will be guaranteed to remain unequal.


All that's left is to show that if m1,old = m2,old, then m1,new =/= m2,new. Since both mi were reduced by an amount ε/xi, this is equivalent to showing x1 =/= x2. If they were equal, then:

y1 = m1,oldx1 + b = m2,oldx2 + b = y2

Contradicting our assumption that all points are distinct. Thus, m1, new =/= m2, new, and no two points are collinear with Q.

BlueRaja - Danny Pflughoeft
We can assume x1 =/= x2 as then the line will be parallel to the y-axis.
Moron
@M: Yep; that's essentially the same thing (it wouldn't necessarily be true if the points weren't distinct)
BlueRaja - Danny Pflughoeft