views:

1120

answers:

4

I have a table where I'm storing Lat/Long coordinates, and I want to make a query where I want to get all the records that are within a distance of a certain point.

This table has about 10 million records, and there's an index over the Lat/Long fields

This does not need to be precise. Among other things, I'm considering that 1 degree Long == 1 degree Lat, which I know is not true, but the ellipse I'm getting is good enough for this purpose.

For my examples below, let's say the point in question is [40, 140], and my radius, in degrees, is 2 degrees.

I've tried this 2 ways:


1) I created a UDF to calculate the Square of the Distance between 2 points, and I'm running that UDF in a query.

SELECT Lat, Long FROM Table   
WHERE (Lat BETWEEN 38 AND 42)   
  AND (Long BETWEEN 138 AND 142)  
  AND dbo.SquareDistance(Lat, Long, 40, 140) < 4

I'm filtering by a square first, to speed up the query and let SQL use the index, and then refining that to match only the records that fall within the circle with my UDF.


2) Run the query to get the square (same as before, but without the last line), feed ALL those records to my ASP.Net code, and calculate the circle in the ASP.Net side (same idea, calculate the square of the distance to save the Sqrt call, and compare to the square of my radius).


To my suprise, calculating the circle in the .Net side is about 10 times faster than using the UDF, which leads me to believe that I'm doing something horribly wrong with that UDF...

This is the code I'm using:

CREATE FUNCTION [dbo].[SquareDistance] 
(@Lat1 float, @Long1 float, @Lat2 float, @Long2 float)
RETURNS float
AS
BEGIN
    -- Declare the return variable here
    DECLARE @Result float
    DECLARE @LatDiff float, @LongDiff float

    SELECT @LatDiff = @Lat1 - @Lat2
    SELECT @LongDiff = @Long1 - @Long2

    SELECT @Result = (@LatDiff * @LatDiff) + (@LongDiff * @LongDiff)

    -- Return the result of the function
    RETURN @Result

END

Am I missing something here?
Shouldn't using a UDF within SQL Server be much faster than feeding about 25% more records than necessary to .Net, with the overhead of the DataReader, the communication between processes and whatnot?

Is there something I'm doing horribly wrong in that UDF that makes it run slow?
Is there any way to improve it?

Thank you very much!

+2  A: 

You can improve the performance of this UDF by NOT declaring variables and doing your calculations more in-line. This will likely improve performance a little but (but probably not much).

CREATE FUNCTION [dbo].[SquareDistance] 
(@Lat1 float, @Long1 float, @Lat2 float, @Long2 float)
RETURNS float
AS
BEGIN
    Return ( SELECT ((@Lat1 - @Lat2) * (@Lat1 - @Lat2)) + ((@Long1 - @Long2) * (@Long1 - @Long2)))
END

Even better would be to remove the function and put the calculations in the original query.

SELECT Lat, Long FROM Table   
WHERE (Lat BETWEEN 38 AND 42)   
  AND (Long BETWEEN 138 AND 142)  
  AND ((Lat - 40) * (Lat - 40)) + ((Long - 140) * (Long - 140))  < 4

There is a little bit of overhead with calling a user defined function. By removing the function, you are likely to gain a little in performance.

Also, I encourage you to check your execution plan just to make sure you are getting index seeks like you expect.

G Mastros
Wow, I feel pretty dumb right now, for not making the leap to convert the calculation from UDF to direct SQL...I'll try this and check how it works.As for the indexes, it's definitely using them, it doesn't even touch the table, and it's seeking, not scanning. Thank you!!
Daniel Magliola
+1  A: 

There is a lot of overhead in using a UDF.

Even coding it in-line may not be good because an index can not be used, although here the BETWEEN clauses should reduce the data that needs crunched.

To extend G Mastros' idea, separate the select bit from the square bit. It may help the optimiser.

SELECT
    Lat, Long
FROM
    (
    SELECT
        Lat, Long
    FROM 
        Table   
    WHERE
        (Lat BETWEEN 38 AND 42)   
        AND
        (Long BETWEEN 138 AND 142)
    ) foo
WHERE
    ((Lat - 40) * (Lat - 40)) + ((Long - 140) * (Long - 140))  < 4

Edit: You may be able to reduce the actual calculations involved. This next idea may reduce the number of calcs from 7 to 5

    ...
    SELECT
        Lat, Long,
        Lat - 40 AS LatDiff, Long - 140 AS LongDiff
    FROM 
    ...
    (LatDiff * LatDiff) + (LongDiff * LongDiff)  < 4
    ...

Basically, try the 3 solutions offered and see what works. The optimiser may ignore the derived table, it may use it, or it may generate an even worse plan.

gbn
It may help the optimizer, but probably won't. The optimizer is smart enough to recognize a derived table and optimize the query as though it weren't there.
G Mastros
True, but it could help readability as well.Edited to add more work onto inner query for "Lat - 40", "Long - 40" once instead of twice for outer query
gbn
A: 

Updates:

GMastros: You were absolutely right. Doing the math in the query itself is infinitely faster than the UDF. I'm using the SQUARE() function to do the multiplication, which makes it a bit more concise, but performance is the same.

However, doing it this way is still twice as slow as doing the math in .Net.
I can't really understand that, but i've come to a compromise that is useful for my particular situation (which sucks, because I need to duplicate code, but it's the best scenario, unless we can find a way to make the circle calculation in SQL be faster)

Thanks!

Daniel Magliola
A: 

Check this article that describes why UDF in SQL Server are generically speaking a bad idea. Unless you're pretty sure the table you're invoking the UDF will not grow up a lot beware that UDF functions are always called on ALL the rows in your tables and not (as one can wrongly guess) only on resultset. This can give you a big performance hit when database grow.

The very good article linked details also some ways to overcome the problem but the real fact is that the SQL Server TSQL dialect misses a way to create a scalar function or a deterministic one (like Oracle does).

massimogentilini