ansaurus

Question

Finding the farthest point in one set from another set

Answer 1

A:

EDIT: I meant nlog(n) where n is the sum of the sizes of both sets.

In the 1-Space set I you could do something like this (pseudocode)

Use a structure like this

Struct Item {
    int value
    int setid
}

(1) Max Distance = 0
(2) Read all the sets into Item structures
(3) Create an Array of pointers to all the Items
(4) Sort the array of pointers by Item->value field of the structure
(5) Walk the array from beginning to end, checking if the Item->setid is different from the previous Item->setid if (SetIDs are different)
check if this distance is greater than Max Distance if so set MaxDistance to this distance

Return the max distance.

Jason Punyon 2009-02-26 20:52:37

Your answer does not make sense. Could you provide pseudocode for the 1-space version?

Sparr 2009-02-26 20:54:30

This is the 1-space version.

Jason Punyon 2009-02-26 20:55:22

How does step (4) happen in linear time?

Peter 2009-02-26 20:57:22

I apologize for my mistake, 1-space is too simple, sorting is not a usable step for higher dimensions.

Sparr 2009-02-26 21:02:48

Answer 2

+7 A:

First you need to find every element's nearest neighbor in the other set.

To do this efficiently you need a nearest neighbor algorithm. Personally I would implement a kd-tree just because I've done it in the past in my algorithm class and it was fairly straightforward. Another viable alternative is an R-tree.

Do this once for each element in the smallest set. (Add one element from the smallest to larger one and run the algorithm to find its nearest neighbor.)

From this you should be able to get a list of nearest neighbors for each element.

While finding the pairs of nearest neighbors, keep them in a sorted data structure which has a fast addition method and a fast getMax method, such as a heap, sorted by Euclidean distance.

Then, once you're done simply ask the heap for the max.

The run time for this breaks down as follows:

N = size of smaller set
M = size of the larger set

N * O(log M + 1) for all the kd-tree nearest neighbor checks.
N * O(1) for calculating the Euclidean distance before adding it to the heap.
N * O(log N) for adding the pairs into the heap.
O(1) to get the final answer :D

So in the end the whole algorithm is O(N*log M).

If you don't care about the order of each pair you can save a bit of time and space by only keeping the max found so far.

*Disclaimer: This all assumes you won't be using an enormously high number of dimensions and that your elements follow a mostly random distribution.

Ben S 2009-02-26 20:54:38

Answer 3

A:

The most obvious approach seems to me to be to build a tree structure on one set to allow you to search it relatively quickly. A kd-tree or similar would probably be appropriate for that.

Having done that, you walk over all the points in the other set and use the tree to find their nearest neighbour in the first set, keeping track of the maximum as you go.

It's nlog(n) to build the tree, and log(n) for one search so the whole thing should run in nlog(n).

Peter 2009-02-26 20:56:34

That is true if all elements are in the same set, but there are two sets to handle.

Ben S 2009-02-26 21:03:30

I think I'm talking about pretty much the same idea as yours, except skipping the heap thing - unless I misunderstood the question, all we need to find is the maximum.

Peter 2009-02-26 21:30:42

Answer 4

A:

To make things more efficient, consider using a Pigeonhole algorithm - group the points in your reference set (your colorTable) by their location in n-space. This allows you to efficiently find the nearest neighbour without having to iterate all the points.

For example, if you were working in 2-space, divide your plane into a 5 x 5 grid, giving 25 squares, with 25 groups of points.

In 3 space, divide your cube into a 5 x 5 x 5 grid, giving 125 cubes, each with a set of points.

Then, to test point n, find the square/cube/group that contains n and test distance to those points. You only need to test points from neighbouring groups if point n is closer to the edge than to the nearest neighbour in the group.

Bevan 2009-02-26 21:01:51

kd-trees do something similar to this.

Ben S 2009-02-26 21:02:41

Answer 5

A:

For each point in set B, find the distance to its nearest neighbor in set A.

To find the distance to each nearest neighbor, you can use a kd-tree as long as the number of dimensions is reasonable, there aren't too many points, and you will be doing many queries - otherwise it will be too expensive to build the tree to be worthwhile.

mbeckish 2009-02-26 21:03:00

ansaurus

tags:

views:

answers:

Finding the farthest point in one set from another set

related questions