views:

58

answers:

5

I have to query a database of thousands of entries and order this by the distance from a specified point.

The issue is that each entry has a latitude and longitude and I would need to retrieve each entry to calculate its distance. With a large database, I don't want to retrieve each row, this may take some time.

Is there any way to build this into the mysql query so that I only need to retrieve the nearest 15 entries.

E.g.

`SELECT events.id, caclDistance($latlng, events.location) AS distance FROM events ORDER BY distance LIMIT 0,15`

    function caclDistance($old, $new){
       //Calculates the distance between $old and $new
    }
A: 

i think stored procedures are what you're looking for.

oezi
+1  A: 

Is this what you're looking for? http://zcentric.com/2010/03/11/calculate-distance-in-mysql-with-latitude-and-longitude/

jasper
+4  A: 

Option 1: Do the calculation on the database by switching to a database that supports GeoIP.

Option 2: Do the calculation on the databaseusing a stored procedure like this:

CREATE FUNCTION calcDistance (latA double, lonA double, latB double, LonB double)
    RETURNS double DETERMINISTIC
BEGIN
    SET @RlatA = radians(latA);
    SET @RlonA = radians(lonA);
    SET @RlatB = radians(latB);
    SET @RlonB = radians(LonB);
    SET @deltaLat = @RlatA - @RlatB;
    SET @deltaLon = @RlonA - @RlonB;
    SET @d = SIN(@deltaLat/2) * SIN(@deltaLat/2) +
    COS(@RlatA) * COS(@RlatB) * SIN(@deltaLon/2)*SIN(@deltaLon/2);
    RETURN 2 * ASIN(SQRT(@d)) * 6371.01;
END//

If you have an index on latitude and longitude in your database, you can reduce the number of calculations that need to be calculated by working out an initial bounding box in PHP ($minLat, $maxLat, $minLong and $maxLong), and limiting the rows to a subset of your entries based on that (WHERE latitude BETWEEN $minLat AND $maxLat AND longitude BETWEEN $minLong AND $maxLong). Then MySQL only needs to execute the distance calculation for that subset of rows.

If you're simply using a stored procedure to calculate the distance) then SQL still has to look through every record in your database, and to calculate the distance for every record in your database before it can decide whether to return that row or discard it.

Because the calculation is relatively slow to execute, it would be better if you could reduce the set of rows that need to be calculated, eliminating rows that will clearly fall outside of the required distance, so that we're only executing the expensive calculation for a smaller number of rows.

If you consider that what you're doing is basically drawing a circle on a map, centred on your initial point, and with a radius of distance; then the formula simply identifies which rows fall within that circle... but it still has to checking every single row.

Using a bounding box is like drawing a square on the map first with the left, right, top and bottom edges at the appropriate distance from our centre point. Our circle will then be drawn within that box, with the Northmost, Eastmost, Southmost and Westmost points on the circle touching the borders of the box. Some rows will fall outside that box, so SQL doesn't even bother trying to calculate the distance for those rows. It only calculates the distance for those rows that fall within the bounding box to see if they fall within the circle as well.

Within your PHP (guess you're running PHP from the $ variable name), we can use a very simple calculation that works out the minimum and maximum latitude and longitude based on our distance, then set those values in the WHERE clause of your SQL statement. This is effectively our box, and anything that falls outside of that is automatically discarded without any need to actually calculate its distance.

There's a good explanation of this (with PHP code) on the Movable Type website that should be essential reading for anybody planning to do any GeoPositioning work in PHP.

EDIT The value 6371.01 in the calcDistance stored procedure is the multiplier to give you a returned result in kilometers. Use appropriate alternative multipliers if you want to result in miles, nautical miles, meters, whatever

Mark Baker
With _' a database that supports GeoIP'_, do you actually mean _`with spatial or geocoded indexes'_ (e.g. PostGIS and the like)? I don't see the point of the `IP` in `GeoIP` relating to the question ;)
Wrikken
@Wrikken PostGIS was the database on my mind when I typed "a database that supports GeoIP"... <blush> the IP was typed without thinking, because I've been playing with reading lat/long from IP and using that as my base location for distance calculations recently.
Mark Baker
A: 

If your question is a "find my nearest" or "store finder" type question then you can google for those terms. Generally though, that type of data is accompanied by a postal code of some description, and it is possible to narrow down the list (as Mark Maker points out) by association with postal code.

Every case is different, and this may not apply to you, just throwing it out there.

Cups
+1  A: 

SELECT events.id FROM events ORDER BY pow((lat - pointlat),2) + pow((lon - pointlon),2) ASC LIMIT 0,15

You dont have to calculate the absolute distance in meters using the radius of the earth and so forth.

To get the closest points you only need the points ordered with relative distance.

Henrik