tags:

views:

2745

answers:

4

I currently have just under a million locations in a mysql database all with longitude and latitude information. With this I use another lat/lng to find the distance of certain places in the database but it's not as fast as I want it to be especially with 100+ hits a second. Is there a faster formula or possibly a faster system other than mysql for this? The formula I'm using is this.

select name, ( 3959 * acos( cos( radians(42.290763) ) * cos( radians( locations.lat ) ) * cos( radians( locations.lng ) - radians(-71.35368) ) + sin( radians(42.290763) ) * sin( radians( locations.lat ) ) ) ) AS distance from locations where active = 1 HAVING distance < 10 ORDER BY distance;
+5  A: 
  • Create your points using Point values of Geometry datatypes in MyISAM table

  • Create a SPATIAL index on these points

  • Use MBRContains() to find the values:

    SELECT  *
    FROM    table
    WHERE   MBRContains(LineFromText(CONCAT('(',
            @lon - 10 * ( 111.1 / cos(@lon)), ' ',
            @lat - 10 / 111.1, ', ',
            @lon + 10 * ( 111.1 / cos(@lon)), ' ',
            @lat + 10 / 111.1, ')'),
            mypoint)
    

    This will select all points approximately within the box (@lat +/- 10 km, @lon +/- 10km).

    This actually is not a box, but a spherical rectangle: latitude and longitude bound segment of the sphere. This may differ from a plain rectangle on the Franz Joseph Land, but quite close to it on most inhabited places.

  • Apply additional filtering to select everything inside the circle (not the square)

  • Possibly apply additional fine filtering to account for the big circle distance (for large distances)

Quassnoi
How good is the Spatial in mySQL, I remember using it a while ago and it didn't seem that great. Have they improved it in newer versions?
Ryan Detzel
@Ryan Detzel: In 5.0, this query will be a matter of milliseconds for 1,000,000 rows. Just run EXPLAIN on the query and make sure the SPATIAL index is used for coarse filtering.
Quassnoi
@Quassnoi:A couple corrections: You'll probably want to switch the order of the coordinates to lat, long. Also, longitudinal distances are proportional the cosine of the *latitude*, not longitude. And you'll want to change it from multiplication to division, so your first coordinate would be corrected as `@lon - 10 / ( 111.1 / cos(@lat))` (and be the second in the pair once everything was correct.
M. Dave Auayan
**WARNING** : The body of the answer has NOT been edited to accord with the very valid comment made by @M. Dave Auayan. Further notes: This method goes pearshaped if the circle of interest (a) includes a pole or (b) is intersected by the +/-180 degree meridian of longitude. Also using `cos(lon)` is accurate only for smallish distances. See http://janmatuschek.de/LatitudeLongitudeBoundingCoordinates
John Machin
+1  A: 

http://postgis.refractions.net/

You may have to look into this database that is optimized for geolocation storage.

siong1987
+3  A: 

Not a MySql specific answer, but it'll improve the performance of your sql statement.

What you're effectively doing is calculating the distance to every point in the table, to see if it's within 10 units of a given point.

What you can do before you run this sql, is create four points that draw a box 20 units on a side, with your point in the center i.e.. (x1,y1 ) . . . (x4, y4), where (x1,y1) is (givenlong + 10 units, givenLat + 10units) . . . (givenLong - 10units, givenLat -10 units). Actually, you only need two points, top left and bottom right call them (X1, Y1) and (X2, Y2)

Now your SQL statement use these points to exclude rows that definitely are more than 10u from your given point, it can use indexes on the latitudes & longitudes, so will be orders of magnitude faster than what you currently have.

e.g.

select . . . 
where locations.lat between X1 and X2 
and   locations.Long between y1 and y2;

The box approach can return false positives (you can pick up points in the corners of the box that are > 10u from the given point), so you still need to calculate the distance of each point. However this again will be much faster because you have drastically limited the number of points to test to the points within the box.

I call this technique "Thinking inside the box" :)

EDIT: Can this be put into one SQL statement?

I have no idea what mySql or Php is capable of, sorry. I don't know where the best place is to build the four points, or how they could be passed to a mySql query in Php. However, once you have the four points, there's nothing stopping you combining your own SQL statemen with mine.

select name, 
       ( 3959 * acos( cos( radians(42.290763) ) 
              * cos( radians( locations.lat ) ) 
              * cos( radians( locations.lng ) - radians(-71.35368) ) 
              + sin( radians(42.290763) ) 
              * sin( radians( locations.lat ) ) ) ) AS distance 
from locations 
where active = 1 
and locations.lat between X1 and X2 
and locations.Long between y1 and y2
having distance < 10 ORDER BY distance;

I know with MS SQL I can build a SQL statement that declares four floats (X1, Y1, X2, Y2) and calculates them before the "main" select statement, like I said, I've no idea if this can be done with MySql. However I'd still be inclined to build the four points in C# and pass them as parameters to the SQL query.

Sorry I can't be more help, if anyone can answer the MySQL & Php specific portions of this, feel free to edit this answer to do so.

Thanks to all,

Binary Worrier
P.S. I did this years ago for a parcel delivery sub system, and it worked a treat. Apologies if latitudes should be Y and Longitudes should be X :)
Binary Worrier
Interesting, so, is there a way to tie this into one function for a SQL query or would it be faster to just pull out the locations in the box and then use what ever language I'm using to calculate the actual distances?
Ryan Detzel
Depends on the volume of data to be honest. Consider that area of the box will be approx 27% larger than the circle your constrained to, so potentially you're going to bring back 27% more data than you need. If is is an extra 27 rows, I wouldn't sweat it, if it's going to be 27,000 then I'd have SQL weed them out.
Binary Worrier
Ryan: What do you mean by "is there a way to tie this into one function for a SQL query"?
Binary Worrier
@Binary Worrier: I mean, one select statement.
Ryan Detzel
You can find a mysql procedure for this approach in this presentation: http://www.scribd.com/doc/2569355/Geo-Distance-Search-with-MySQL
Lucia
+2  A: 

Check this presentation for a good answer. Basically it shows the two different approaches shown in the comments, with a detailed explanation on why/when you should use one or the other and why the "in the box" calculation can be very interesting.

Geo Distance Search with MySQL

illarra