views:

216

answers:

2

What kind of data structure could be used for an efficient nearest neighbor search in a large set of geo coordinates? With "regular" spatial index structures like R-Trees that assume planar coordinates, I see two problems (Are there others I have overlooked?):

  • Wraparound at the poles and the International Date Line
  • Distortion of distances near the poles

How can these factors be allowed for? I guess the second one could compensated by transforming the coordinates. Can an R-Tree be modified to take wraparound into account? Or are there specialized geo-spatial index structures?

A: 

Could you use a locality-sensitive hashing (LSH) algorithm in 3 dimensions? That would quickly give you an approximate neighboring group which you could then sanity-check by calculating great-circle distances.

Here's a paper describing an algorithm for efficient LSH on the surface of a unit d-dimensional hypersphere. Presumably it works for d=3.

Josh McFadden
+1  A: 

Take a look at Geohash.

Also, to compensate for wraparound, simply use not one but three orthogonal R-trees, so that there does not exist a point on the earth surface such that all three trees have a wraparound at that point. Then, two points are close if they are close according to at least one of these trees.

jkff
Geohash seems to be a "works pretty well most of the time" kind of thing, but cannot be relied on to always provide a common prefix for nearby locations. However, the idea of using several R-Trees looks like a good solution for the wraparound problem.
Michael Borgwardt