ansaurus

Question

Answer 1

+2 A:

I have used scipy.spatial before, and it appears to be a nice improvement (especially wrt the interface) as compared to scikits.ann.

In this case I think you have confused the return from your tree.query(...) call. From the scipy.spatial.KDTree.query docs:

Returns
-------

d : array of floats
    The distances to the nearest neighbors.
    If x has shape tuple+(self.m,), then d has shape tuple if
    k is one, or tuple+(k,) if k is larger than one.  Missing
    neighbors are indicated with infinite distances.  If k is None,
    then d is an object array of shape tuple, containing lists
    of distances. In either case the hits are sorted by distance
    (nearest first).
i : array of integers
    The locations of the neighbors in self.data. i is the same
    shape as d.

So in this case when you query for the nearest to [1,1] you are getting:

distance to nearest: 0.0
index of nearest in original array: 0

This means that [1,1] is the first row of your original data in array, which is expected given your data is y = x on the range [1,50].

The scipy.spatial.KDTree.query function has lots of other options, so if for example you wanted to make sure to get the nearest neighbour that isn't itself try:

tree.query([1,1], k=2)

This will return the two nearest neighbours, which you could apply further logic to such that cases where the distance returned is zero (i.e. the point queried is one of data items used to build the tree) the second nearest neighbour is taken rather than the first.

dtlussier 2010-10-06 17:33:22

Thanks kindly. Makes a lot more sense now!

jlv 2010-10-06 19:39:46

ansaurus

tags:

views:

answers:

Data type problem using scipy.spatial

related questions