views:

269

answers:

2

I have a table full of Dealers along with their latitude and longitude. I am trying to determine the top n closest dealers to any given lat and lon. I already have the function to calculate distance between locations, but I want to do as few calculations as possible (my table can contain many thousands of entries). Currently I have to calculate the distance for each entry then sort them. Is there any way to sort before I do the calculation to improve performance?

This question is good, but I will not always know my range. Should I just pick an arbitrarily high range then refine my results? I am thankful for any help the community can offer.

declare @Lat real
declare @lon real

Set @lat = 41.05
Set @lon = -73.53 

SELECT top 10
    MemberID,
    Address1,
    City,
    State,
    Zip,
    Phone,
    Lat,
    Lon,
    (SELECT fun_DistanceLatLon] (@Lat,@lon,Lat,Lon)) as mDistance --Calculate distance
FROM
    Dealers
Order by
    (SELECT fun_DistanceLatLon] (@Lat,@lon,Lat,Lon))
+1  A: 

Instead of performing your calculation on all the records, you could use a simple pythagoras calculation on your lat / long numbers and select the top 20. Then you can select the actual top 10 using your more accurate function. Whilst this is inaccurate to start with it should be accurate enough to reduce your dataset.

EDIT: Something like this

declare @Lat real 
declare @lon real 

Set @lat = 41.05 
Set @lon = -73.53  

SELECT top 10 
    MemberID, 
    Address1, 
    City, 
    State, 
    Zip, 
    Phone, 
    Lat, 
    Lon, 
    (SELECT fun_DistanceLatLon] (@Lat,@lon,Lat,Lon)) as mDistance --Calculate distance 
FROM 
    Dealers 
WHERE
    MemberId IN
(
    SELECT TOP 20
        MemberID
    FROM
        Dealers
    ORDER BY
        SQRT(SQUARE(@Lat - Lat) + SQUARE(@Lon - Lon))
)
Order by 
    (SELECT fun_DistanceLatLon] (@Lat,@lon,Lat,Lon)) 

However, I suggest you try both with your dataset and profile it to see what the actual performance difference may be.

Robin Day
Thanks. This does seem to improve performance (better performance as the number of records increases). If I don't need the actual distance, is there anything wrong with omitting the inner query and just use the 'order by SQRT(SQUARE(@Lat - Lat) + SQUARE(@Lon - Lon))'
J.Hendrix
No, for short distances, say 30 miles or less Pythagoras is accurate to several hundred feet. For greater distances go with the Haversine function.
Scot Hauder
+1  A: 

I think you'd really like a geospatial index. Otherwise, as the number of rows grows very large, you'll have to access every single row to do the Pythagorean calculation.

It looks like SQL Server supports spatial data types, and SQL Server 2008 even boasts of "new spatial indexes for high performance queries". Could you use a SQL Server spatial data type and then do a proximity query against the spatial index?

npdoty