views:

132

answers:

3

Hi all

I have a scenario, where I have x million longitude latitude points.

When a new long/lat point is added I want to know efficiently which other points are within a user configured distance parameter, so I can add them to a list.

got anything better than bounding boxes?

I would love to see algorithms, references and a few implementations ;) thank you kindly!

+1  A: 

A colleague told me that he had good experience with using Morton-Code as a spatial index on GIS data, maybe that is something worth investigating.

André
I've used Morton codes in a database with tens of millions of records - they work well.
Stephen Nutt
+2  A: 

There are quite a few options that are better, mostly based around space partitioning.

A common, and often very good option (which isn't too tough to implement) is to use a KD-Tree. Quadtrees are easier to implement, but slower for searching. Depending on the distribution of your data, and your requirements, other space partitioning algorithms may perform better, have lower memory requirements, or other issues that are related.

Reed Copsey
I definitely agree he wants to do some space partitioning. He will have to modify the quadtree concept to get it to work, though, since it is intended for two-dimensional space in which the regions are rectangular. He will also need to worry about wrapping, as Nosredna rightly points out.
PeterAllenWebb
Yeah, but quadtrees and kd trees can be used in these situations. Quadtrees are easier, since handling the wrapping gets much easier in this case. Typically, though, you're not dealing with a global case, in situations like these, but a smaller region, in which case most of these issues are less problematic.
Reed Copsey
A: 

This quick-and-dirty approach may save you some grief: Divide the surface of the earth into 1 degree boxes. You will then have a 180x360 element array and you will only need to search a small number of boxes, including the box containing the new point and all the boxes immediately around it for which one of the corners is within the user-specified distance. You will find that there are some tricks you can use to quickly figure out what boxes to use without considering them all. Just don't forget latitude and longitude wrap-around.

If your "only" have millions of points, and they aren't clustered into hot-spots, that might get you through.

A theoretically superior way: You could map each point into three dimensional space and then store them in an octree, which would let you quickly find nearby points to within an arbitrary distance. Of course, the distance in three-dimensional space will be slightly different than the great-circle distance on the globe, so you will have to calculate a conversion factor. That should be simple, though. You don't mention an implementation language, but there is almost certainly going to be a well-tested octree implementation for any language you are working in. If you don't mind inserting the third-party code, this solution is the way to go.

PeterAllenWebb