views:

448

answers:

9

Let's imagine we have a plane with some points on it. We also have a circle of given radius.

I need an algorithm that determines such position of the circle that it covers maximal possible amount of points.
If there are many such positions, the algorithm should return one of them.
Precision is not important and algorithm may do small mistakes.

Here is an example picture:

Input:
int n (n<=50) - number of points;
float x[n] and float y[n] - arrays with points' X and Y coordinates;
float r - radius of the circle.

Output:
float cx and float cy - coordinates of the circle's center

Lightning speed of the algorithm is not required, but it shouldn't be too slow (because I know a few slow solutions for this situation)

C++ code is preferred, but not obligatory.

Thank you...

A: 

This is famous K-closest point algorithm. Described here: http://www.cs.ucsb.edu/~suri/cs235/ClosestPair.pdf

Jack
No it's not. You could have a best circle centered on a point that is *not* part of the set.
Mau
This is not the correct answer
BlaXpirit
if it isn't correct downvote it
fuzzy lollipop
Don't have enough reputation
BlaXpirit
you do now :-) keep asking interesting questions!
fuzzy lollipop
@BlaXpirit if the answer isn't correct then please unselect it as the accepted solution.
chollida
O_o How did that happen!?
BlaXpirit
+14  A: 

Edited to better wording, as suggested :

Basic observations :

  • I assume the radius is one, since it doesn't change anything.
  • given any two points, there exists at most two unit circles on which they lie.
  • given a solution circle to your problem, you can move it until it contains two points of your set while keeping the same number of points of your set inside it.

The algorithm is then:

  • For each pair of points, if their distance is < 2, compute the two unit circles C1 and C2 that pass through them.
  • Compute the number of points of your set inside C1 and C2
  • Take the max.
Alexandre C.
I like this solution. It may be the one I need... But I'm not sure it's always correct
BlaXpirit
Can you explain the rationale Alex?
Mau
It is O(n^3) in the number of points. You can check whether C is inside the circle of radius 1 passing through A and B (with one prescribed orientation) by calculating one determinant I think. But please beware of roundoff errors, they can really spoil this kind of algorithms.
Alexandre C.
I didn't notice your answer :(, wrote absolutelly the same with bit explanations.
Max
@Mau, @BlaXpirit: You can move the disk a little until it touches two points without changing the number of points inside it.
Alexandre C.
+1 It is obviously true solution. `O(n^3)` is not good but it is mathematically true solution.
Alexey Malistov
@Alexey: it's not really n^3 since a lot of pairs are discarded as long as their distance is > 2, so it's rather O(n^2) on average.
Alexandre C.
Accepted. Reasons: This solution seems to be correct. The other ones are difficult, or I simply don't understand them
BlaXpirit
I don't really get it. What does this even mean? `Take two points, you have (at most) two unit circles that pass through it`? What's `it`? Why are you picking unit circles? This should really be explained better.
IVlad
@IVlad: ok, I reword my quickly written answer.
Alexandre C.
`given a solution circle to your problem, you can move it until it contains two points of your set while keeping the same number of points of your set inside it.` - I don't think this is true. Anyway, ignoring that, consider a triangle that (barely) fits inside a circle of radius `r`. The distance between each two points of the triangle is `> 2`. Consider two points very very far away from the triangle with mutual distance `< 2`. I think your algorithm would only take those 2 points as solution. Your algorithm is wrong in my opinion, it doesn't even use `r` anywhere.
IVlad
@ IVlad: For every two points that are < 2r apart, there are two circles of radius r that could pass through these two points. (If two points are *exactly* 2r apart, there's only one circle.) With regards to the two points, these circles are "optimal": Since the points are on the very border of the circle, the circle covers a maximum of additional space (and, possibly, further points). Iterating over all point pairs gives you all possible circles with >= 2 points in them. Check these circles for the number of points they each contain, and select the one circle with the highest number.
DevSolar
@DevSolar - and if there are no points with mutual distance `< 2`?
IVlad
@ IVlad: Assuming you meant <= 2, then each point has a circle of radius r around it that contains only that point.
DevSolar
@ IVlad: "I don't think this is true." - it is. You move the circle in any direction until the first point is on its line. Then you "swing" the circle around that point until the second point is on its line. You now have a circle that touches two points, without having lost any points. (Rather evident, I think. Unless the circle didn't contain 2+ points to begin with.)
DevSolar
@IVlad, @DevSolar: a correct proof would involve topology and the "toll-passing lemma". The argument relies on the fact that the function "center of circle -> number of points" is integer-valued.
Alexandre C.
@DevSolar - I think my confusion stems from the fact that you used `2` in your answer. Shouldn't it be `2*r`?
IVlad
Umm.. actually it was the only answer I fully understood.
BlaXpirit
But aren't there actually some points outside the "unit" circle, which will be covered by a big circle?
doc
@ doc: The radius of all circles in the solution is identical (r). @ IVlad: Yes, the "2" in the answer should read "2r".
DevSolar
@DevSolar - well, +1 then. I really think that should be made more clear, you really had me scratching my head when I saw no mention of `r` anywhere.
IVlad
OK then. It shouldn't be called a unit circle, but rather a circle with radius r. It's a good solution but it is written in the terrible way. "you can move it until it contains two points of your set while keeping the same number of points of your set inside it." - until you'll find two points on its edge, while keeping the same number of points inside it ;/
doc
@doc: edited, I assumed r = 1 without saying it.
Alexandre C.
This would make a great LINQ project.
maxwellb
This could be reduced from O(n^3) to O(n^2 * mlogn) (m is average number of points in each tested circle). If you put all the points in a quad-tree and use that when testing each circle to tally the other points it contains. The big-O math would get messy, but it could be further reduced by using the quad-tree to identify the candidate-pairs of points that are close enough to be worth testing.
Alan
@Alan: It is already less than O(n^3) since on average only O(n) pairs of points are within 2r (assuming uniform distribution). So you actually get something O(n^2)-ish.
Alexandre C.
I wonder what happens when you take floating point inaccuracies into account.
starblue
+1  A: 

The problem reverts back to finding the global optimum of a function f :R x R -> N. The input for f is the center point of the circle, and the value, of course, is the number of included points from the set. The graph of the function will be discontinuous, staircase-like.

You could start by testing the function in each point corresponding to a point in the set, order the points by decreasing values of f, then intensify search around those points (for example moving out along a spiral).

Another option is to consider all intersection points of segments connecting any pairs of points in the set. Your optimal point will be at one of these intersections I think, but their number is probably too big to consider.

You may also hybridise options 1 and 2, and consider intersections of the segments generated from points around 'good set points'.

Anyhow, unless the number of set points is low (allowing you to check all intersections), I don't think you can guarantee to find the optimum (just a good solution).

Mau
I can see the same answer posted three times in total - probably by mistake. Can you delete the two instances of it?
Suma
Done. SO technical issues :-)
Mau
The number of points is low (I've just edited the question). But I don't understand a few things, like... how can you make a graph of 2-parameter function?
BlaXpirit
minimization of discrete-valued functions is *very* tough.
Alexandre C.
Well, you don't have to draw the graph, I was just picturing it in my head... like this anyway: http://soft.proindependent.com/doc/manual-en/pics/3D-function-plot-modified.png
Mau
@Alexandre. In this case, however the set of points to consider is finite.
Mau
@Mau: The set of points is *always* finite because computers cannot deal with infinite sets. Still, it's a non-linear discrete optimization problem, which is the toughest class of optimization problems. There are no known good generic algorithms for this class of problems, as opposed to linear or continuous optimization problems.
Philipp
@Philipp Well I think really only linear (non-integer) are tractable in the general case. But we digress :-)
Mau
A: 

If it is true that precision is not important and algorithm may do small mistakes then I think the following.

Let f(x,y) is a function which has a maximum at the point (0,0) and is only significant at the points inside of circle of radius R. For example, f(x,y) = e^{(x^2 + y^2)/ (2 * R^2)}.

Let (x_i,y_i) are points and E_i(x,y) = f(x - x_i, y - y_i).

Your problem is to find the maximum of \sum_i E_i(x,y) .

You can use a gradient descent starting it from each point.

Alexey Malistov
I'm 15-year-old kid. I love programming and I don't like Math. So I often don't understand these smart math things :)
BlaXpirit
@BlaXpirit, you can achieve a lot by programming if you don't know math a lot. But to effectively solve--and understand solutions of--tasks like the one you posted you should get better in maths. (By the way, Alexey is also using LaTeX notation of writing mathematical functions, it's not related to maths, but one of those things you'd better learn about)
Pavel Shved
@Alexey, it would be absolutely correct if `f` was `1 inside the circle, and 0 outside`. :-)
Pavel Shved
@Pavel, gradient descent does not work correctly for the case of your nondifferentiable function.
Alexey Malistov
@Alexey, exactly, but otherwise you risk to find a point that isn't a solution. In fact, I'm sure (though I didn't try to prove) that, for each continuous function, it's possible to construct an input for which the algorithm fails as badly as we wish.
Pavel Shved
@Pavel, I'm sure too. I started my answer with `If it is true that precision is not important...`.
Alexey Malistov
@Alexey, ok, might have missed it. Sorry for irrelevant comments.
Pavel Shved
+1  A: 

At a first glance, I would say a quad tree solution.

Also, there is an information visualization/Data mining method called K-means which makes clusters of given data. It can be used with added functionality in the end to fit your purpose.

The basic algorithm for K-Means is:

  1. Place K points into the space represented by the items that are being clustered - These points represent initial group centroids
  2. Assign each data item to the group that has the closest centroid
  3. When all items have been assigned, recalculate the positions of the K centroids by calculating the mean position of your dots
  4. Repeat Steps 2 and 3 until the centroids no longer move, or move very little

The added functionality would then be:

  1. Calculate number of points within your circle positioned at the K centroids
  2. Choose the one that suits you the best ;)

Sources: K-means algorithm: http://servus.itn.liu.se/courses/TNM048/Slides2009/InfVizTNM048%202009%20Lecture%204%20Large%20data.pdf - Linköping University

Quadtree: en.wikipedia.org/wiki/Quadtree

A quick search on wikipedia and you find source code: en.wikipedia.org/wiki/K-means_clustering

(sorry but i cant use more than one hyperlink in my post heheh...

+4  A: 

This is the "disk partial covering problem" in the literature -- that should give you a good place to start googling. Here's a paper covering one possible solution, but it is a little intense mathematically: http://www.utdallas.edu/~edsha/papers/bin/ISPAN04_cover.pdf

As a matter of fact, this falls in the area called computational geometry, which is fascinating but can be hard to get a toehold in. There's a good overview by deBerg on various algorithms related to the subject.

Joel
I don't think the algorithm can be used in this situation, because I have only one circle
BlaXpirit
In your case, the paper's "k" is 1.
Joel
+3  A: 

If you want something simple, take random position (x,y), calculate number of points inside circle and compare with previous position. Take the maximum. Repeat the operation any times you want.

Why the hell downvote? Ever heard about Monte Carlo methods? Actually for a huge amount of points, deterministic algorithm may not finish in reasonable time.

doc
Not the downvoter, but OP: "I need an algorithm that _determines_...."
maxwellb
@maxwellb: "Precision is not important and algorithm may do small mistakes"
doc
Reasonable time, also, is expressed in terms of a relationship with the number of points given. For example, O(n^3) means that the algorithm finishes in time proportional to the cube of the number of points.Often, "reasonable" time for an algorithm is: polynomial (n^3, n^8, n^k): OK, linear (n, or 3n, (1/9)n, kn, all are n): great, logarithmic: awesome.
maxwellb
@doc: Fair enough, and I think your point is valid, so I didn't downvote. Cheers.
maxwellb
For my situation, this solution isn't too bad
BlaXpirit
A: 

Might I suggest a density map? Find the min and max bounds for the x and y. Divide the range of the x and y bounds into bins having widths equal to the diameter of your circle. Count the number of points in each bin for the x and y separately. Now find the intersection on your density map between the highest ranked x bin with the highest ranked y bin.

This is a VERY fast algorithm to quickly generalize large data sets but it is not always accurate, to improve accuracy you can slice the bins into smaller and smaller pieces or shift the bin positions to the left or right n times and use a voting system to select the answer that occurs most often between trials.

Peter Hanneman
"not always accurate": http://imagebin.org/105002
maxwellb
Instead of large data sets, I have very small data sets. So this is incorrect here
BlaXpirit
A: 

You could pixelize the whole area, then go to each point and increase the value of all pixels within the circle of the radius around that point. The pixels with the highest sum are good candidates.

Of course, you might lose some good areas or "hallucinate" good areas due to rounding errors. Perhaps you could try to do a rough pixellation first, then refine the promising areas.

Svante