views:

731

answers:

6

Given a set of n points on plane, I want to preprocess these points somehow faster than O(n^2) (O(nlog(n)) preferably), and then be able to answer on queries of the following kind "How many of n points lie inside a circle with given center and radius?" faster than O(n) (O(log(n) preferably).

Can you suggest some data structure or algorithm I can use for this problem?

I know that such types of problems are often solved using Voronoi diagrams, but I don't know how to apply it here.

+7  A: 

Build a KD-tree of the points, this should give you much better complexity than O(n), on average O(log(n)) I think.

You can use a 2D tree since the points are constrained to a plane.

Assuming that we have transformed the problem into 2D, we'll have something like this for the points:

 struct Node {
     Pos2 point;
     enum {
        X,
        Y
     } splitaxis;
     Node* greater;
     Node* less;
 };

greater and less contains points with greater and lesser coordinates respectively along the splitaxis.

 void
 findPoints(Node* node, std::vector<Pos2>& result, const Pos2& origin, float radius) {
     if (squareDist(origin - node->point) < radius * radius) {
         result.push_back(node->point);
     }
     if (!node->greater) { //No children
          return;
     }
     if (node->splitaxis == X) {
         if (node->point.x - origin.x > radius) {
             findPoints(node->greater, result, origin radius);
             return;
         }
         if (node->point.x - origin.x < -radius) {
             findPoints(node->less, result, origin radius);
             return;
         }
         findPoints(node->greater, result, origin radius);
         findPoints(node->less, result, origin radius);
     } else {
         //Same for Y
     }
 }

Then you call this function with the root of the KD-tree

Andreas Brinck
KD-trees are often useful, but I don't see how they can help in this case. Could you explain?
ShreevatsaR
@ShreevatsaR Done ;)
Andreas Brinck
Thanks for your answer Andreas. Your code finds all points inside circle, so I think it can't be faster than O(log(n)+k), where k is answer. But I need only the number k, not the points. Actually, I know how to modify it. We need to store the size of subtree in each node, and if all possible points inside subtree lie inside circle we just add the subtree's size to the answer without traversing the subtree. But I'm not sure if this will be better than O(N) in the worst case.
+1  A: 

Assuming you have a set of points S in a cartesian plane with coordinates (xi,yi), given an arbitrary circle with center (xc,yc) and radius r you want to find all the points contained within that circle.

I will also assume that the points and the circle may move so certain static structures that can speed this up won't necessarily be appropriate.

Three things spring to mind that can speed this up:

Firstly, you can check:

(xi-xc)^2 + (yi-yc)^2 <= r^2

instead of

sqrt((xi-xc)^2 + (yi-yc)^2) <= r

Secondly, you can cull the list of points somewhat by remembering that a point can only be within the circle if:

  • xi is in the range [xc-r,xc+r]; and
  • yi is in the range [yc-r,yc+r]; and

This is known as a bounding box. You can use it as either an approximation or to cut down your list of points to a smaller subset to check accurately with the first equation.

Lastly, sort your points in either x or y order and then you can do a bisection search to find the set of points that are possibly within your bounding box, further cutting down on unnecessary checks.

cletus
Only the last suggestion actually changes the asymptotic running time.
Ron Warholic
Yes but the last suggestion makes no sense outside the context of the other two.
cletus
This algorithm still has a worst case running time of O(n).
Andreas Brinck
(Say all points have the same y value and you've chosen to sort by y)
Andreas Brinck
Good luck finding a sub-O(n) solution to this problem in the worst case.
cletus
Bear in mind that lookup in a Java Map or C# Dictionary is O(n) in the worst case. It's just that worst case is rather unlikely so such structures are quoted as being "near O(1)" access as an exepcted case assuming semi-normal hash distribution. This problem too should be approached from an expected case point of view based on a realistic distribution of points.
cletus
Even based on totally uniform distribution of points this algorithm isn't O(log(n)). You can't use a hash map for your algorithm, you need an ordered map like `std::map` where you can find the valid range in O(log(n)) time, not O(1) time.
Andreas Brinck
And when you have the range, you'll it will (probably) include more than 1 point, how many of course depends on the radius of the circle.
Andreas Brinck
Thanks cletus. I understand your idea but it is still too slow if for example all n points are inside circle and all points have distinct x and y coordinates.
Why did this get upvotes? It does not give any (valid) practical suggestion to solve the problem.
kigurai
A: 

depending on how much precomputing time you have, you could build a tree like this:

first node branches are x-values, below them are y-value nodes, and below them are radius value nodes. at each leaf is a hashset of points.

when you want to compute the points at x,y,r: go through your tree and go down the branch that matches your x,y values the closest. when you get down to the root level, you need to do a little match (constant time stuff), but you can find a radius such that all the points in that circle (defined by the path in the tree) are inside the circle specified by x,y,r, and another circle (same x_tree,y_tree in the tree as before, but different r_tree) such that all of the points in the original circle (specified by x,y,r) are in that circle.

from there, go through all the points in the larger of the two tree circles: if a point is in the smaller circle add it to the results, if not, run the distance check on it.

only problem is that it takes a very long time to precompute the tree. although, you can specify the amount of time you want to spend by changing how many x,y, and r branches you want to have in your tree.

twolfe18
+9  A: 

Build a spatial subdivision structure such as a quadtree or KD-tree of the points. At each node store the amount of points covered by that node. Then when you need to count the points covered by the lookup circle, traverse the tree and for each subdivision in a node check if it is fully outside the circle, then ignore it, if it is fully inside the circle then add its count to the total if it intersects with the circle, recurse, when you get to the leaf, check the point(s) inside the leaf for containment.

This is still O(n) worst case (for instance if all the points lie on the circle perimeter) but average case is O(log(n)).

Ants Aasma
OK, looks like there's no way to do this faster than O(n) in the worst case. Then O(log(n)) on average will probably be enough for me. Thanks!
A: 

Man, the Challange is to do it yourself!

Dave
What is that challenge, and how it is related to my question?
I was making a joke. It just so happens, that by answering your question, one gets the solution to the above challenge^
Dave
+2  A: 

If my goal is speed, and the number of points weren't huge (millions,) I'd focus on memory footprint as much as algorithmic complexity.

An unbalanced k-d tree is best on paper, but it requires pointers, which can expand memory footprint by 3x+, so it is out.

A balanced k-d tree requires no storage, other than for an array with one scalar for each point. But it too has a flaw: the scalars can not be quantized - they must be the same 32 bit floats as in the original points. If they are quantized, it is no longer possible to guarantee that a point which appears earlier in the array is either on the splitting plane, or to its left AND that a point which appears later in the array is either on the splitting plane, or to its right.

There is a data structure I developed that addresses this problem. The Synergetics folks tell us that volume is experientially four-directional. Let's say that a plane is likewise experientially three-directional.

We're accustomed to traversing a plane by the four directions -x, +x, -y, and +y, but it's simpler to use the three directions a, b, and c, which point at the vertices of an equilateral triangle.

When building the balanced k-d tree, project each point onto the a, b, and c axes. Sort the points by increasing a. For the median point, round down, quantize and store a. Then, for the sub-arrays to the left and right of the median, sort by increasing b, and for the median points, round down, quantize, and store b. Recurse and repeat until each point has stored a value.

Then, when testing a circle (or whatever) against the structure, first calculate the maximum a, b, and c coordinates of the circle. This describes a triangle. In the data structure we made in the last paragraph, compare the median point's a coordinate to the circle's maximum a coordinate. If the point's a is larger than the circle's a, we can disqualify all points after the median. Then, for the sub-arrays to the left and right (if not disqualified) of the median, compare the circle's b to the median point's b coordinate. Recurse and repeat until there are no more points to visit.

This is similar in theme to the BIH data structure, but requires no intervals of -x and +x and -y and +y, because a, b, and c are just as good at traversing the plane, and require one fewer direction to do it.

bmcnett