views:

232

answers:

2

I just finished implementing a kd-tree for doing fast nearest neighbor searches. I'm interested in playing around with different distance metrics other than the Euclidean distance. My understanding of the kd-tree is that the speedy kd-tree search is not guaranteed to give exact searches if the metric is non-Euclidean, which means that I might need to implement a new data structure and search algorithm if I want to try out new metrics for my search.

I have two questions:

  1. Does using a kd-tree permanently tie me to the Euclidean distance?
  2. If so, what other sorts of algorithms should I try that work for arbitrary metrics? I don't have a ton of time to implement lots of different data structures, but other structures I'm thinking about include cover trees and vp-trees.
+2  A: 

The nearest-neighbour search procedure described on the Wikipedia page you linked to can certainly be generalised to other distance metrics, provided you replace "hypersphere" with the equivalent geometrical object for the given metric, and test each hyperplane for crossings with this object.

Example: if you are using the Manhattan distance instead (i.e. the sum of the absolute values of all differences in vector components), your hypersphere would become a (multidimensional) diamond. (This is easiest to visualise in 2D -- if your current nearest neighbour is at distance x from the query point p, then any closer neighbour behind a different hyperplane must intersect a diamond shape that has width and height 2x and is centred on p). This might make the hyperplane-crossing test more difficult to code or slower to run, however the general principle still applies.

j_random_hacker
That's a great answer. Does evert metric have an associated metric? Are there any rules abut the shapes that correspond to different metrics?
James Thompson
@James: The rule is that the shape is always formed by the set of points that are at distance x from the query point. So e.g. for the Euclidean distance in 2D this is a circle; for Manhattan, a diamond. For a weird metric this might not be any "recognisable" shape.
j_random_hacker
+2  A: 

I don't think you're tied to euclidean distance - as j_random_hacker says, you can probably use Manhattan distance - but I'm pretty sure you're tied to geometries that can be represented in cartesian coordinates. So you couldn't use a kd-tree to index a metric space, for example.

Nick Johnson
I see what you mean. A metric is often given with an embedding in a Cartesian space, which my answer assumed -- but in the most general case, you can't assume that every object can be represented as a point in Cartesian space, and yes, in that case KD-trees won't work.
j_random_hacker