tags:

views:

116

answers:

4

Hi there,

Trying to create a KNN search using a KD-tree. I can form the KD-tree fine (or at least, I believe I can!). My problem is that I am searching to find the closest 2 neighbours to every point in a list of points.

So, is there a method to find the K nearest neighbours to a point using a KD tree even if the point is actually IN the tree, or do I need to construct a seperate KD tree for each point, leaving out the point that I wish to search for?

My implementation language is C++, but I am more looking for either an algorithm or general help, thanks!

Thanks, Stephen

A: 

This isn't really much of an answer, but I can't fit what I want to paste into a comment. Anyhow, here's the relevant text from Wikipedia:

The algorithm can be extended in several ways by simple modifications. It can provide the k-Nearest Neighbors to a point by maintaining k current bests instead of just one. Branches are only eliminated when they can't have points closer than any of the k current bests.

It can also be converted to an approximation algorithm to run faster. For example, approximate nearest neighbour searching can be achieved by simply setting an upper bound on the number points to examine in the tree, or by interrupting the search process based upon a real time clock (which may be more appropriate in hardware implementations). Nearest neighbour for points that are in the tree already can be achieved by not updating the refinement for nodes that give zero distance as the result, this has the downside of discarding points that are not unique, but are co-located with the original search point.

Approximate nearest neighbor is useful in real time applications such as robotics due to the significant speed increase gained by not searching for the best point exhaustively. One of its implementations is Best Bin First.

Steven Sudit
+3  A: 

If you want the K exact nearest-neighbors within your tree, just query the tree for K+1 neighbors (obviously since the first nearest neighbor will be your query).

Jacob
+1 and a side note: it does not matter for the search if the searched point is in the tree. So this approach will work fine, you will only have to filter out the searched point from the results, which should be very easy (since it's the closest one).
PeterK
A: 

I'm afraid I can't work out how to reply to posts (might have something to do with not being the official asker of the question - I wasnt logged in when I posted it...), so I'm just posting this as an answer and I hope people will see it.

Firstly, thank you very much to everyone who pointed me in the direction of the wikipedia article - it has been invaluable.

However, it has led me to have another question, which I'm not sure if I should ask here or make another topic for. But I shall ask here regardless, and I'm sure I'll get told off if it's the wrong thing to do!

On the wiki page, in the description of the NN algorithm, it says

If the hypersphere crosses the plane, there could be nearer points on the other side of the plane, so the algorithm must move down the other branch of the tree from the current node looking for closer points, following the same recursive process as the entire search. If the hypersphere doesn't intersect the splitting plane, then the algorithm continues walking up the tree, and the entire branch on the other side of that node is eliminated.

This is represented in their LUA code example by:

-- search the near branch --
child = child_near(here,point)

And:

-- search the away branch - maybe -- ...
    child = child_away(here,point)

I really don't get the 'child near' and 'child_away' parts - at first from the text description I thought it meant that I should traverse over to the equivalent level node on the opposite branch from the current node, but does this mean instead that one of the node's children is the "near" node, and one of them is the "away" node?

I apologise, but I'm very confused about this.

-Stephen

Stephen
The "near" node and "away" node are relevant only to the search. When you get to a splitting point, you have to decide which child will get checked first for neighbours. The one which is searched first is called the "near" child. If then the algorithm decides that there could be better neighbours in the other subtree, it goes also into the second child, which is called "away" child.
PeterK
A: 

I'd recommend taking a look at ANN for implementation details http://www.cs.umd.edu/~mount/ANN/

It's designed for approximate nearest neighbor searches, but can also do exact nearest neighbor searches. It's also some of the clearest and best written code I've ever found, and should give you some insight even if you want your own implementation.

Meekohi