views:

52

answers:

2

I currently have a Postgres DB filled with approx. 300.000 data-sets of moving vehicles all over the world. My very frequently repeated query is: Give me all vehicles in a 5/10/20mile radius. Currently I spend around 600 to 1200 ms in the DB to prepare the set of located vehicle-objects.

I am looking to vastly improve this time by ideally one or two orders of magnitude if possible. I am working in a Ruby on Rails 3.0beta environment if this is relevant.

Any ideas how to architect the whole system to accelerate this query? Any NoSQL database able to deliver this kind of geolocation performance? I know of MongoDB working on an extension to facilitate this scenario but haven't tried it yet. Any intelligent use of Redis to achieve this?

One problem with SQL-DBs here seems to be that I can't possibly use indexes because my vehicles are mostly moving around, meaning I had to constantly created DB indexes which, by itself, is probably more expensive than just doing the searching without index.

Looking forward to your thoughs, Thanks!

A: 

Even though your vehicles are moving around all the time, I assume they have some kind of speed limit. What you can do is to create some kind of discrete coordinate system, one example would be the integer part of the lat/long coordinate. Then you put those values in separate columns, keeping the exact location in another column. You should then be able to index the integer columns, as the vehicles won't move so much that they change those values very often.

When doing a search, you first find out what "squares" are interesting, and restrict your query to the vechicles within those sqeares, using the indexed columns. Then you have to do a full search of all vehicles within each square. The number of vehicles you have to do a full search over should now only be a small fraction of all vechiles. The efficiency of this strategy of course depends on the distribution of your vechiles. If 50% of them are in a certain city somewhere this will not work, but assuming the largest group of vehicles in one place is 5-10% it should improve performance.

Anders Abel
Thanks, great idea to get quick results!
itsme
+1  A: 

If you use the right algorithm for organizing your data, you will be able to use a spatial index which can dramatically speed up your queries.

The best practice for the geolocation domain is to use a geohash, quad-tree, R-tree or similar data structure (R-trees are the most generic, but it sounds like you're querying point data, so that may not matter). In each case, you can create a spatial index that uses a single, linear column where each value represents a bounding box of varying size and shape. This should let you answer most queries with a single range query in your database. Spatial indices can be implemented in SQL (PostGIS, MS SQL, MySQL all have spatial datatypes and spatial indices which use one of these techniques) or NoSQL (popular for its horizontal scalability; AppEngine has geomodel, SimpleGeo uses Cassandra, Foursquare uses MongoDB).

Using an index can be complicated by constantly moving points, but I would suspect that writes, even slightly heavier writes that update indices, wouldn't be your bottleneck.

npdoty
Thanks for the algorithm-pointers, I'll check them out. Any idea what kind of improvement I may get with the best solution?
itsme
That will depend _a lot_ on the details of your particular situation. Let us know how it works out for you.
npdoty