views:

127

answers:

3

I have some code doing this right now. It works fine with small to medium sized lists, but when I have a list of size n > 5000 then the my algorithm can take almost 1 minute on a mobile device to run. I'm basically comparing a Coordinate object in Java to a list (Vector) of Coordinate objects.

Here's my basic algorithm:

  • traverse each element in the list nx
  • if there is less 10 items in the "10 closest" list then add nx to the list and go to the next element
  • if the "10 closest" list has 10 items already, then calculate the distance between nx and the base Coordinates
  • if the distance is less than furthest distance in the "10 closest list" then remove the furthest item from that list and replace it with nx

I keep looking at this and am trying to find a more efficient way of doing this. It's sort of like a sorting algorithm problem so there must be a better way.

Here is my distance calculation method:

public static double distance(double lat1, double lon1, double lat2, double lon2, char unit) {

  double theta = lon1 - lon2;

  double dist = Math.sin(deg2rad(lat1)) * Math.sin(deg2rad(lat2)) + Math.cos(deg2rad(lat1)) * Math.cos(deg2rad(lat2)) * Math.cos(deg2rad(theta));

  dist = acos(dist);

  dist = rad2deg(dist);

  dist = dist * 60 * 1.1515;

  if (unit == 'K') {

    dist = dist * 1.609344;

  } else if (unit == 'N') {

    dist = dist * 0.8684;

    }

  return (dist);

}
A: 

Well, maybe it would be faster to do this with arrays. And you could compare the square of the distance instead of the distance, which means that you don't have to work with square roots.

It would be good to have the actual code.

thejh
+2  A: 

You could store your coordinates in some space partitioning tree.

Or, for a simpler approach, you could use a two-dimensional array of buckets, and check the closest buckets first, until you found enough nearest neighbors. This only works well if the coordinates are distributed evenly.

Edit: To compare the distances you could precompute 3D coordinates on the sphere and use the square of the Euclidean distance in the comparisons:

dx * dx + dy * dy + dz * dz
starblue
Here is my distance calculation algorithm. It's very precise, but I'm thinking that I might not need that precision when just calculating the 10 closest out of the list:
Steve C
public static double distance(double lat1, double lon1, double lat2, double lon2, char unit) { double theta = lon1 - lon2; double dist = Math.sin(deg2rad(lat1)) * Math.sin(deg2rad(lat2)) + Math.cos(deg2rad(lat1)) * Math.cos(deg2rad(lat2)) * Math.cos(deg2rad(theta)); dist = acos(dist); dist = rad2deg(dist); dist = dist * 60 * 1.1515; if (unit == 'K') { dist = dist * 1.609344; } else if (unit == 'N') { dist = dist * 0.8684; } return (dist); }
Steve C
Oh dear, the poor embedded device. You definitely should think about simplifying that. And better edit it into the question so it is more readable.
starblue
A: 

You might be able use something like the approach at this website to restrict the number of points that actually require you to compute the distance.

The website shows how to compute the lat, lon bounding coordinates for a point and a given distance. That is not exactly the same problem that you have, but it could serve as a filter. In your case you are apparently trying to find the 10 (or n) closest points to a given point. You could apply the following algorithm to find the 10 (or n) closest points:

For the first n points, you could go through the full-blown distance calculation that you have, saving the distance along each point.

Save the overall longest distance.

Compute the lat, lon bounding box as illustrated on the website above.

Continue through the rest of your points.

If any point is outside of the lat, lon bounding box, it cannot be closer than any of the current 10 closest points.
If it is inside the bounding box, calculate the distance.

Discard the farthest of the previous set of 10 "closest" points.

Recompute the lat, lon bounding box based on the new farthest point.

Repeat until all points processed.

The benefit of this approach is that you might be able to avoid heavy calculations for a large number of your points. Depending on the distribution of your points, you could still suffer from poor performance, such as if the points turn out to be ordered such that they are in decreasing distance from your target points (point[0] is the farthest and point[N] is the closest)).

wageoghe