views:

612

answers:

5

We have a list of x,y pairs. Every pair represents a point on a 2D space. I want to find the closest point from this list, to a specific point xq,yq. What is the best performance-critical algorithm for this problem? Lisp of points is not going to change; which means I do not need to perform insertion and deletion. I want just to find the nearest neighbor of a target xq,yq point in this set.

Edit 1: Thanks to all! As Stephan202 has guessed correctly, I want to do this repeatedly; like a function. An the list is not necessarily sorted (In fact I do not understand how can it be sorted? Like a table with a primary key of 2 columns a and y? If that helps then I will sort it).

I will construct the data structure based on the list one time, and then I will use this generated data structure in the function (if this process itself is relevant).

Thank you Jacob; It seems that KD-Tree data structure is a good candidate for being the answer (And I feel it is. I will update when I get some relevant results).

Edit 2: I have found that, this problem is named "nearest neighbor"!

Edit 3: First title was "In Search of an Algorithm (for Spatial-Querying and Spatial-Indexing) (Nearest Neighbor)"; I have chose a new title: "Best Performance-Critical Algorithm for Solving Nearest Neighbor". Since I do not want to perform insertion and deletion operation on my initial data and I want just the nearest one from them to a new point (which is not going to be inserted), I chose to (currently) working on KD-Trees. Thanks to all!

A: 

Iterate through every other point using the distance formula to find the minimum distance from Q (xq,yq).

However, you haven't given enough information for a performance-critical answer.

For example, if Q is a VERY common point, you might want to calculate the distance to Q and store it with each point.

Second example, if you have a huge number of points, you might organize the points into sections and start with points only in the same section and adjacent sections to the section containing Q.

Steven
+5  A: 

As Stephan202 noted, if you plan to find the closest-match for more than one point, you should use a tree.

I would recommend a KD-tree, whose implementation can be easily found in several packages like OpenCV 2.0. Or you could implement one yourself!

EDIT: I had asked a question about kd-tree implementations here - might be useful.

EDIT: KD-trees have been widely used successfully for NN searches :) - Also, if you're willing to accept approximate matches, you can use Fast Library for Approximate Nearest Neigbor (FLANN). The FLANN implementation is present in OpenCV 2.0.

If you don't want approximate answers you can tweak the FLANN parameters to search the entire tree.

Jacob
+1 KD trees are built for this
I was thinking of suggesting them as well, glad I took the time to look at the answers already suggested :)
Matthieu M.
KD trees aren't built for this in the same way as some data structures are. If you find out that the query point is in the cell for point P, you still need to check all of the neighboring KD-tree cells, since any of those could also be the closest point.
jprete
Thanks for responding jprete! I have described my problem in more details which is in fact a "Nearest Neighbor" problem. What algorithm/data structure are you pointing to? (Which you think are better than KD-Trees for solving this).
Kaveh Shahbazian
I have implemented my KD Tree in C# and for uniform GIS position data it is pretty fast (.002 ms on my 2.66 6MB 4GB Studio XPS 16). But for real data, if the query position is far from real areas it goes slow (uo to 500 ms). But since I needed answers in a specific area, I have tuned the KD Tree to apply a max distance constraint; so it remained pretty fast! Of course my problem has it's own attributes: in-memory, no change in set of GIS data (no insert, update, delete node); yet I'v got a more than acceptable result!
Kaveh Shahbazian
Great! Good to know :)
Jacob
+4  A: 
jprete
+4  A: 

You need a spatial index.

If you roll your own, you can do a lot worse than picking the R-Tree or Quad-tree algorithms.

Will
I had not time to read about quadtree much but as far as I studied R-Tree; It is for indexing multi-dimensional data which 1) will be persisted (like in a database, not in-memory) 2) and set of data change (insert, update and delete); neither of them was properties of my problem (KD-Trees are hard to balance too; so they are not proper instead of R-Trees and vice versa). Thanks
Kaveh Shahbazian
I think you should take more time to read about the R-Tree, and then look at the quadtree.If you can't roll your own, just use someone else's. Lots of databases offer GIS functionality.
Will
+1  A: 

I would go with a quadtree. It is the most simple spatial structure. In 2 dimensions i would generally recommend quadtree instead of kd-tree, because it is simpler, faster. Its downside is more memory consumption if the number of dimensions is high, but in case of 2 dimensions the difference is not significant.

There is a nice optimization trick if your coordinates are floating point typed: In a query you first will have to find the leaf-node that contains the point to which the most near point is asked. To do this you will have to go in the tree from the root to the leaf - in each iteration deciding which child-node to step on. Store the identifiers/addresses of the child-nodes in a 4-sized array in the Node structure. Digitize the point coordinates in the query algorithm. Then you will be able to find the proper sub-node just by indexing the array by 2 proper bits of the digitized point coordinates. Digitizing is fast: implement it with a simple static_cast.

But first implement the quadtree without optimization because it is easy to make a bug with the bit-operations. Even without this optimization, it still will be the fastest solution.

Zoli