views:

399

answers:

4

I have a database of geocoded entries. I need to determine which two entries are the furthest apart from a subset of the total entries. For example, I select a list of 10 entries then, from that list, determine which two places represent the greatest distance within that list.

I cannot wrap my head around how to approach this. I've considered using radians even, but nothing seems to meet the requirement.

FYI, LAMP stack going here...

+1  A: 

Brute force approach:

  1. Find the center of your list of ten by averaging the latitude and longitude values.

  2. For each (latitude,longitude) pair in your database, use the great circle formula to calculate distance from the center from step (1)

  3. Pick greatest two distances.

Obvious optimization: break the world in to N "squares" (e.g., 10 degrees longitude, 10 degrees latitude) and pre-compute the great circle distance between the centers of of each pairing. Store this in the database. Now you can quickly look for the furthest away "squares" and only check (latitude,longitude) pairs inside those tiles.

derobert
@Derobert: That optimization does not work. Since I need to format text for a better explanation see the bottom of my answer.
Eric J.
@Eric J: That's a very good point. But I think you could get around it by checking some extra squares (but still a lot less than all of them).
derobert
A: 

Here is the algorithm implemented in PHP for the distance between two points based on latitude and longitude.

Note that if the "subset of total entries" is large, you quickly have quite a few computations to do. If that's the case, you may want to consider pre-calculating distances between city pairs.

EDIT: Why 10 degree optimization does not work:

Take four squares as shown below

-------------------
|        |        |
|   A    |   B    |
|        |        |
|_______1|________|
|        |2       |
|   C    |   D    |
|        |        |
|_______3|________|

By only measuring the centers of the squares and comparing those distances, you get A and D are further apart than A and C. However, cities 1 and 3 are clearly further apart than 1 and 2.

Eric J.
+2  A: 

The following query will calculate the distance between all your points and return the two with the biggest distance:

SELECT coor1.longitude as lon1,
       coor1.latitude as lat1,
       coor2.longitude as lon2,
       coor2.latitude as lat2,
       (ACOS(
         COS(RADIANS(coor1.latitude))  * 
         COS(RADIANS(coor1.longitude)) *
         COS(RADIANS(coor2.latitude))  *
         COS(RADIANS(coor2.longitude)) + 
         COS(RADIANS(coor1.latitude))  *
         SIN(RADIANS(coor1.longitude)) *
         COS(RADIANS(coor2.latitude))  *
         SIN(RADIANS(coor2.longitude)) +
         SIN(RADIANS(coor1.latitude))  * 
         SIN(RADIANS(coor2.latitude))
         ) * 6378                        --- Use 3963.1 for miles
       ) 
AS DistanceKM
FROM coordinates coor1,
     coordinates coor2
WHERE NOT (coor1.longitude = coor2.longitude AND coor1.latitude = coor2.latitude)
ORDER BY DistanceKM DESC
LIMIT 1;                                 --- Only the biggest

Now I recommending doing those calculations before hand and storing the result in a separate table.

Andrew Moore
This will re-compute the answer each time. That may or may not be a bad thing, depending on the size of the data set and the frequency that the answer is requested. Also, on large systems, placing CPU-intensive calculation on the database isn't a good design choice (frequently better to have it on the application server). For small and medium websites this is a very good option.
Eric J.
Like I said, I recommend doing those calculations before hand and storing the result in a separate table. I posted that query for information purposes.
Andrew Moore
this looks promising, many thanks. the use case is that people can create ad hoc lists of entries, which i then need to assign a kind of point value to (the list) based on how great the distance is.
Don
+2  A: 

By the looks of it, this could be solved by first finding the convex hull of the points (using Graham's scan, for instance), and then doing rotating calipers for the diameter on that.

Sebastian P.
That works if points are on a plane. How do you do that for a sphere?
niteria
I believe it works as long as all points are on the same side of a great circle (that is, on the same half of the sphere), though that's just a hunch, so don't hold me to it.
Sebastian P.
It should work for half-sphere.
niteria