views:

22

answers:

2

I searched on SO and didn't really find an answer to this but it seems like a common problem.

I have several hundred thousand locations in a database, each having the geocode (lat/long). If it matters, they are spread out across the U.S. Now, I have a client app in which I want users to give me their lat/long and a radius (say 5mi, 10mi, 25mi, etc) and I want to return all the records that match. I only care abouot the distance value that can be gained via, say, the Haversine formula, not shortest road distance. However, given that, I want it to be as accurate as possible.

This database is mostly read-only. On a good day, there might be 10 inserts. Now, I will have hundreds of clients, maybe tens of thousands of clients that will be using the software. I want users to get results in a few seconds but if a single query takes 10-20 seconds, then it will crawl when hit with a load of clients.

How do I serve up results as efficiently as possible? I know I could just store them in MySQL or PostgreSQL (Oracle and MS SQL Server are out for this, but some other open source data store may be fine) and just put the Haversine formula in there WHERE clause but I don't think that is going to yield efficient results.

A: 

I'm starting to feel like the offical spokesman for Sphinx. This article explains how to configure it for geospatial searching: http://www.god-object.com/2009/10/20/geospatial-search-using-sphinx-search-and-php/

To clarify: the data will be stored in mysql/postgres but indexed by and searched via Sphinx.

bemace
+1  A: 

PostgreSQL supports a wide range of geospatial queries, so long as it's got the PostGIS extensions installed. Nearest or radius or bounding box searches are particularly easy.

TokenMacGuy