views:

167

answers:

3

I have two sets of Animal objects. The distance between animals is defined using a specific algorithm that looks at their traits. I am trying to design a method to find the pair from the two sets (one from each) that minimizes the distance.

One idea I had: create a parametrized Tuple class to pair up Animals. Create a PriorityQueue with a comparator to sort Tuple<Animal> according to the distance between the two members. Then, pick the first pair from the PriorityQueue.

Is this good design, or is it wasteful? I believe it would run in O(m+n) time, where m and n are the sizes of each collection.

If Tuple is a parametrized class, how would it work to use a Comparator on it that only works on Animal?

I want to use this findMinimalPair method to create a spanning tree minimizing the distances of a graph of Animal objects. What if I did this by continuously popping pairs off the PriorityQueue, checking to make sure that each pair still contained one member of each collection?

Here is a basic example. Here are the distances:

     A0     A1     A2     A3
A0   0      20     33     8
A1   20     0      102    73
A2   33     102    0      6
A3   8      73     6      99

Assume that the collections are:

A0

A1, A2, A3

Here would be the sorted order of tuples, by distance:

(A0, A3) - 8 
(A0, A1) - 20 
(A0, A2) - 33

So we see that A3 is the closest. A3 is then moved into the first collection:

A0, A3

A1, A2

Again, we check for a minimal pair:

(A3, A2) - 6
(A0, A1) - 20 
(A0, A2) - 33
(A3, A1) - 73

Now A2 is taken. See how it works?

This is what I ended up doing. Comments?

+1  A: 

Actually, you will need to create m*n tuples in order to have all possible tuples, which will take O(mn). The you need to sort the list of tuples which take at minimum O(mn*log(mn)), so the complexity is O(mn*log(mn)) - even with priority queue (you will have mn inserts, with O(log(mn)) complexity per each).

EDIT

Just saw an error in the above solution - If you just want to find the minimal pair, the actual complexity is O(mn) since you need one path on all the pairs. If you want to have a list of all the pairs sorted by their distance in order to have the minimum spanning tree then it is O(mn*log(mn)). In any case, it is not O(m+n)

David Rabinowitz
A: 

My suggestion is for u to use the tuples which will give u O(n.m) (not O(n+m)). And then use the bucket sort algorithm bucket sort where each bucket will be a tuple .. so u will get O(n.m) on your problem. (from what I understood of your problem u can use bucket sort, otherwise u will have to use a O(nlogn) algorithm )

Diego Dias
A: 

If you want to find the distances between all possible tuples between set 1 and set 2, I think you may be stuck with nested loops, resulting in O(m*n). I don't think there's any way you can get O(m+n).

If I recall correctly, you need a completed graph before you can find its minimum spanning tree. It seems like the graph will have O(V^2) edges (since there's a distance between each pair of animals), so if you're using Prim's Algorithm for finding minimum spanning trees you may want to use the Fibonacci heap and adjacency list (which is O(E + V log(V))).

http://en.wikipedia.org/wiki/Prim%27s%5Falgorithm

Seth