Given a 2 dimensional plane in which there are n points. I need to generate the equation of a line that divides the plane such that there are n/2 points on one side and n/2 points on the other. (by the way this not home work, I am just trying to solve the problem)
I'd guess that a good way is to sort/sequence/order the points (e.g. from left to right), and then choose a line which passes through (or between) the middle point[s] in the sequence.
Create an arbitrary line in that plane. Project each point onto that line a.k.a for each point, get the closest point on that line to that point.
Order those points along the line in either direction, and choose a point on that line such that there is an equal number of points on the line in either direction.
Get the line perpendicular to the first line which passes through that point. This line will have half the original points on either side.
There are some cases to avoid when doing this. Most importantly, if all the point are themselves on a single line, don't choose a perpendicular line which passes through it. In fact, choose that line itself so you don't have to worry about projecting the points. In terms of the actual mathematics behind this, vector projections will be very useful.
There are obvious cases where no solution is possible. E.g. when you have three heaps of points. One point at location A, Two points at location B, and five points at location C.
If you expect some decent distribution, you can probably get a good result with tlayton's algorithm. To select the initial line slant, you could determine the extent of the whole point set, and choose the angle of the largest diagonal.
I dont know how useful this is I have seen a similar problem...
If you already have the directional vector (aka the coefficients of the dimensions of your plane).
You can then find two points inside that plane, and by simply using the midpoint formula you can find the midpoint of that plane.
Then using the coefficients of that plane and the midpoint you can find a plane that is from equal distance from both points, using the general equation of a plane.
A line then would constitute in expressing one variable in terms of the other so you would find a line with equal distance between both planes.
There are different methods of doing this such as projection using the distance equation from a plane but I believe that would complicate your math a lot.
I have assumed the points are distinct, otherwise there might not even be such a line.
If points are distinct, then such a line always exists and is possible to find using a deterministic O(nlogn) time algorithm.
Say the points are P1, P2, ..., P2n. Assume they are not all on the same line. If they were, then we can easily form the splitting line.
First translate the points so that all the co-ordinates (x and y) are positive.
Now suppose we magically had a point Q on the y-axis such that no line formed by those points (i.e. any infinite line Pi-Pj) passes through Q.
Now since Q does not lie within the convex hull of the points, we can easily see that we can order the points by a rotating line passing through Q. For some angle of rotation, half the points will lie on one side and the other half will lie on the other of this rotating line, or, in other words, if we consider the points being sorted by the slope of the line Pi-Q, we could pick a slope between the (median)th and (median+1)th points. This selection can be done in O(n) time by any linear time selection algorithm without any need for actually sorting the points.
Now to pick the point Q.
Say Q was (0,b).
Suppose Q was collinear with P1 (x1,y1) and P2 (x2,y2).
Then we have that
(y1-b)/x1 = (y2-b)/x2 (note we translated the points so that xi > 0).
Solving for b gives
b = (x1y2 - y1x2)/(x1-x2)
(Note, if x1 = x2, then P1 and P2 cannot be collinear with a point on the Y axis).
Consider |b|.
|b| = |x1y2 - y1x2| / |x1 -x2|
Now let the xmax be the x-coordinate of the rightmost point and ymax the co-ordinate of the topmost.
Also let D be the smallest non-zero x-coordinate difference between two points (this exists, as not all xis are same, as not all points are collinear).
Then we have that |b| <= xmax*ymax/D.
Thus, pick our point Q (0,b) to be such that |b| > b_0 = xmax*ymax/D
D can be found in O(nlogn) time.
The magnitude of b_0 can get quite large and we might have to deal with precision issues.
Of course, a better option is to pick Q randomly! With probability 1, you will find the point you need, thus making the expected running time O(n).
If we could find a way to pick Q in O(n) time (by finding some other criterion), then we can make this algorithm run in O(n) time.
I picked up the idea from Moron and andand and continued to form a deterministic O(n) algorithm.
I also assumed that the points are distinct and n is even (thought the algorithm can be changed so that uneven n with one point on the dividing line are also supported).
The algorithm tries to divide the points with a vertical line between them. This only fails if the points in the middle have the same x value. In that case the algorithm determines how many points with the same x value have to be on the left and lower site and and accordingly rotates the line.
I'll try to explain with an example. Let's asume we have 16 points on a plane. First we need to get the point with the 8th greatest x-value and the point with the 9th greatest x-value. With a selection algorithm this is possible in O(n), as pointed out in another answer. If the x-value of that points is different, we are done. We create a vertical line between that two points and that splits the points equal.
Problematically now is if the x-values are equal. So we have 3 sets of points. That on the left side (x < xa), in the middle (x = xa) and that on the right side (x > xa). The idea now is to count the points on the left side and calculate how many points from the middle needs to go there, so that half of the points are on that side. We can ignore the right side here because if we have half of the points on the left side, the over half must be on the right side.
So let's asume we have we have 3 points (c=3) on the left side, 6 in the middle and 7 on the right side (the algorithm doesn't know the count from the middle or right side, because it doesn't need it, but we could also determine it in O(n)). We need 8-3=5 points from the middle to go on the left side. The points we already got in the first step are useless now, because they are only determined by the x-value and can be any of the points in the middle.
We want the 5 points from the middle with the lowest y-value on the left side and the point with the highest y-value on the right side. Again using the selection algorithm, we get the point with the 5th greatest y-value and the point with the 6th greatest y-value. Both points will have the x-value equal to xa, else we wouldn't get to this step, because there would be a vertical line.
Now we can create the point Q in the middle of that two points. Thats one point from the resulting line. Another point is needed, so that no points from the left or right side are divided. To get that point we need the point from the left side, that has the lowest angle (bh) between the the vertical line at xa and the line determined by that point and Q. We also need that point from the right side (with angle ag). The new point R is between the point with the lower angle and a point on the vertical line (if the lower angle is on the left side a point above Q and if the lower angle is on the right side a point below Q).
The line determined by Q and R divides the points in the middle so that there are a even number of points on both sides. It doesn't divide any points on the left or right side, because if it would that point would have a lower angle and would have been choosen to calculate R.
From the view of a mathematican that should work well in O(n). For computer programs it is fairly easy to find a case where precision becomes a problem. An example with 4 points would be A(0, 100000000), B(0, 100000001), C(0, 0), D(0.0000001, 0). In this example Q would be (0, 100000000.5) and R (0.00000005, 0). Which gives B and C on the left side and A and D on the right side. But it is possible that A and B are both on the dividing line, because of rounding errors. Or maybe only one of them. So it belongs to the input values if this algorithm suits to the requirements.
- get that two points Pa(xa, ya) and Pb(xb, yb)
which are the medians based on the x values> O(n)
- if xa != xb you can stop here
because a y-axis parallel line between that two points is the result> O(1)
- get all points where the x value equals xa
> O(n)
- count points with x value less than xa as c
> O(n)
- get the lowest point Pc based on the y values from the points from 3.
> O(n)
- get the greatest point Pd based on the y values from the points from 3.
> O(n)
- get the (n/2-c)th greatest point Pe based on the y values from the points from 3.
> O(n)
- also get the next greatest point Pf based on the y values from the points from 3.
> O(n)
- create a new point Q (xa, (ye+yf)/2)
between Pe and Pf
> O(1)
- for all points Pi calculate
the angle ai between Pc, Q and Pi and
the angle bi between Pd, Q and Pi> O(n)
- get the point Pg with the lowest angle ag (with ag>0° and ag<180°)
> O(n)
- get the point Ph with the lowest angle bh (with bh>0° and bh<180°)
> O(n)
- if there aren't any Pg or Ph (all points have same x value)
create a new point R (xa+1, 0) anywhere but with a different x value than xa
else if ag is lower than bh
create a new point R ((xc+xg)/2, (yc+yg)/2) between Pc and Pg
else
create a new point R ((xd+xh)/2, (yd+yh)/2) between Pd and Ph> O(1)
- the line determined by Q and R divides the points
> O(1)
To add to M's answer: a method to generate a Q (that's not so far away) in O(n log n)
.
To begin with, let Q be any point on the y-axis ie. Q = (0,b)
- some good choices might be (0,0) or (0, (ymax-ymin)/2).
Now check if there are two points (x1, y1), (x2, y2) collinear with Q. The line between any point and Q is y = mx + b
; since b is constant, this means two points are collinear with Q if their slopes m
are equal. So determine the slopes mi for all points and check if there are any duplicates: (amoritized O(n)
using a hash-table)
If all the m's are distinct, we're done; we found Q, and M's algorithm above generates the line in O(n)
steps.
If two points are collinear with Q, we'll move Q up just a tiny amount ε, Qnew = (0, b + ε), and show that Qnew will not be collinear with two other points.
The criterion for ε, derived below, is:
ε < mminΔ*xmin
To begin with, our m's look like this:
mi = yi/xi - b/xi
Let's find the minimum difference between any two distinct mi and call it mminΔ (O(n log n)
by, for instance, sorting then comparing differences between mi and i+1 for all i)
If we fudge b up by ε, the new equation for m becomes:
mi,new = yi/xi - b/xi - ε/xi = mi,old - ε/xi
Since ε > 0 and xi > 0, all m's are reduced, and all are reduced by a maximum of ε/xmin. Thus, if
ε/xmin < mminΔ, ie. ε < mminΔ*xmin
is true, then two mi which were previously unequal will be guaranteed to remain unequal.
All that's left is to show that if m1,old = m2,old, then m1,new =/= m2,new. Since both mi were reduced by an amount ε/xi, this is equivalent to showing x1 =/= x2. If they were equal, then:
y1 = m1,oldx1 + b = m2,oldx2 + b = y2
Contradicting our assumption that all points are distinct. Thus, m1, new =/= m2, new, and no two points are collinear with Q.