tags:

views:

937

answers:

6

Hi Everybody!

I have a large Oracle database ( 720,000 records aprox) where each record has its own geographic coordinates (lat & lng) and i need to select just the records that are in a specific distance from a point ( inside a specific radius).

Currently i've implemented a distance function (based on haversine) that i've found in an oracle forum but because the database is a bit big it spends about 50 seconds per select.

Any recomendations on how to do thi efficiently?. I know there is an extension called oracle spatial & locator but i don´t know if i can buy it or even how does it work. Thanks a lot in advance. Best regards

+3  A: 

Use a better algorithm. Instead of calculating actual Euclidian distance, which requires a square-root calculation, do your select on the linear distance that requires only subtraction and addition. I.e. if your point is at (10, 10) and your radius is 5, select all places with points inside the square formed by (10 +/- 5, 10 +/- 5).

This will catch a small number of false positives in the corners of the square. Eliminate these by double-checking the results in your application by calculating the proper Euclidian distance.

JSBangs
Finally we took this approach as valid because the distance between points are not so big (Just a hundred of meters). Thanks everybody
Fgblanch
One more thing. To make it more efficient we've created two indexes, one for the latitude column and one more for the longitude one. Now the performance is really good.
Fgblanch
A: 

If you don't need the distance to be too accurate, you can just treat the earth as flat. From this discussion:

Approximate distance in miles:

sqrt(x * x + y * y)

where x = 69.1 * (lat2 - lat1) and y = 53.0 * (lon2 - lon1)

I recently did some optimizing for mysql (outlined here: www.mooreds.com/wordpress/archives/000547 [sorry, I only get 1 hyperlink per post] ) but am not sure how many of the steps I went through are applicable to Oracle. Some definitely are (like using a bounding box if possible).

mooreds
Warning! The code above "works" only around a specific reference latitude, the "magic" coefficients are estimated based on this reference latitude. It will be bogus for region far from it.
lOranger
+3  A: 

A couple of suggestions, if you aren't already doing them...

  1. Since the Haversine calculation requires the angles in radians, if you are storing latitude and longitude in degrees, add a couple of columns and precompute the radian equivalents. More generally, pre-compute any of the values in the function that you can for the formula and store them.

  2. Consider using a simpler function to eliminate points that are well outside the radius, running the Haversine function only on those that are potential matches based on the simpler function. For degrees you could use SQRT( (69.1*dLat)2 + (53*dLong)2) ) and use some fudge factor (10%). Run your Haversine calculation only on the points that match the cruder approximation if you need better than what the simpler calculation provides.

tvanfosson
You can probably also skip the sqrt if you square the radius you are searching within.
Dolphin
+1, particularly for item 2.
DCookie
@Dolphin -- I guess I was assuming that eventually the actual distance would be needed as part of the output, but if not then you could simply square the distance for the comparison.
tvanfosson
+1  A: 

Is the "specific distance" somewhat constant? IE are you always searching for "all points within 1 mile" or does the radius change?

What percentage of the total records do you expect to get back in any given query? 10%? .10%?

If you will always have the same radius, build a grid of squares with the same length as the radius. Assign each a list of neighboring squares. Each point will know what square it is in, from which you can get a list of all the neighboring squares. Then run the calculation on only the points in those squares. This is similar to the other answers that have popped up, but will be quicker because the linear calculations are approximated in an indexed lookup rather than calculated between every point.

Even with a variable radius, you can still use the above, but you'll have to calculate how many 'neighbors' to include. These are only feasible if you're expecting to get a small subset of the total from any individual query.

David Oneill
+2  A: 

Do provide more details about the specific format of the Lat and Long values, as well as the specific formula used for implementing haversine.

There are three approaches which can speed up things. Depending on the situation we can do at least two of these.

  1. Weed-out as many records as possible by a simple attribute value comparaison.
    For these records, we don't need to calculate anything at all.
    For example, convert the maximum radius requirement to a [generous but approximate] range of the Longitude (and possibly latitude) values which would qualify

  2. Use an alternative (possibly approximative) distance measurement.
    For example, it may be faster to calculate the square of the eucldidian distance, based on a rounded-up coordinates. (And of course to compare this with the square of desired radius)

  3. Improve the way the haversine formula is implemented.

mjv
+1, especially for item 1.
DCookie
lat and long are float values in separate columns. The implmentation that i've used, i've found in this forum:http://forums.oracle.com/forums/thread.jspa?threadID=477747the ushitaki one.
Fgblanch
+2  A: 

If you have the license then Oracle Spatial might be of use

Oracle Docs - Oracle Spatial

I've not used it but a quick scan of the docs would point to the function SDO_WITHIN_DISTANCE

Paul James
+1: don't reinvent the wheel
Vincent Malgrat