views:

120

answers:

4

a coordinate a(xa,ya) dominates b(xb,yb) if ( xa>=xb and ya>=yb) how can I find all pairs in a set of coordinates in nlgn using divide and conquer?

edit:the number of pairs instead.

A: 

Do a quick sort where your comparison is first to sort by X then Y ( so you'd get something like 5,3 5,2 4,7 4,2 etc. Quicksort is nlogn

Then just iterate from the highest point down doing your compare. That would be at most O(n). You end up with O(n) + O(nlogn) => O(nlogn)

Quicksort uses divide and conquer - it divides on the pivot.

EDIT:

Another thing I considered. You can walk the entire set and put all the points that are dominated in the X coordinate by your point in a set. Then, walk that smaller subset and filter out the ones that are also dominated by your Y. This is just two walks, for O(n) performance.

Anon
That will find one pair in `O(n lg n)`, but he wants to find all pairs.
danben
but to find all pairs this algorithm would take O(N^2)
ryanxu
O(n logn) is quicksort average. Worst case is O(n^2), So I believe you can't sort here
belisarius
You could instead use a different sorting algorithm like merge sort which is O(n log n) in both average and worst cases. (But quicksort is usually faster and you almost never end up with the worst case scenario.)
Ben Alpert
You should a STABLE QuickSort.
Fabio F.
@Fabio: QuickSort is not a stable algorithm in efficient implementations. If you need stability, you should usually use Merge Sort. There's no need for a stable sort here in any case.
Billy ONeal
+1  A: 

Roughly speaking, any given vector (xa,ya) will dominate or be dominated by about half of the other vectors (ya,yb), because among the four cases for {xa <=> ya, xb <=>yb}, two are cases of dominance.

So we expect the solution to your problem to comprise about n*(n/2) pairs of vectors. The algorithm can't be cheaper than its solution, so n*ln(n) is not going to work.

Josephine
This is what I found,but I dont quite understand it.http://www.sable.mcgill.ca/~nngman/comp507/ especially "If we choose to sort all the points in their y-coordinates, then binary search can be used to identify all points in A that are dominated in just O(log N) time for each point in B. "
ryanxu
@ryanxu: The page you linked to is about finding the _number_ of points dominated by each point and able to solve range queries efficiently. What you said in the question is quite different.
Moron
If I want to find the number of pairs, how would i do it in nlgn
ryanxu
@ryanxu: Use the algorithm in the page you linked, and add up the values for each point.
Moron
@Moron, is there a pseudo code for it?
ryanxu
@Moron: I am wondering how they did this step"If we choose to sort all the points in their y-coordinates, then binary search can be used to identify all points in A that are dominated in just O(log N) time for each point in B. " I am not sure how they do this. To sort all the y-coordinates will take nlgn time each time, for the a side
ryanxu
@ryanxu: You only need to sort once.
Moron
@Moron, the size of the A is always changing right. So dont you have to sort it for every single split?
ryanxu
@ryanxu. No, you can sort all of the points by y in O(nlogn) time before even splitting. While splitting you can use another array which contains the points of A in sorted by y order and use that for B.
Moron
A: 

lets say you have the points such that

a1 > a2 > a3 ... > an-1 > an d (read > as dominates) n is proporional to the number of points (N) in all situations.

the number of pairs itself is greater than nlogn (such as (A[1],A[2..n]) (A2], A[3..n]) ..A[n=1], A[n]) I think that is n (n-1) / 2.

so N logN doesn't seem to be possible

neal aise
A: 

Assuming that you only need to count the number of pairs, you can take the sweeping approach:

1) Sort the points according to their X values

2) Collect the points into a search tree.

Here we use a balanced tree based on the Y values of the points. The tree should have a counter per internal node, indicating the number of items in the subtree rooted by it. These counters can be maintained without any impact on the time complexity of the tree operations. The usage of counters allows querying the number of items lower than a given value V, in logarithmic time.

More details of (2): we scan the points obtained in step (1) from left to right. For each point P traversed, we add P to the tree, and then compute the number of items with Y < P.Y. This number is added to the global count which is returned at the end.

Step (1) runs in N*Log(N) time, and step (2) performs N iterations of two Log(N) operations, therefore it has the same complexity.

Eyal Schneider