views:

1234

answers:

9

I can use the median of medians selection algorithm to find the median in O(n). Also, I know that after the algorithm is done, all the elements to the left of the median are less that the median and all the elements to the right are greater than the median. But how do I find the k nearest neighbors to the median in O(n) time?

If the median is n, the numbers to the left are less than n and the numbers to the right are greater than n. However, the array is not sorted in the left or the right sides. The numbers are any set of distinct numbers given by the user.

The problem is from Introduction to Algorithms by Cormen, problem 9.3-7

A: 

If you know the index of the median, which should just be ceil(array.length/2) maybe, then it just should be a process of listing out n(x-k), n(x-k+1), ... , n(x), n(x+1), n(x+2), ... n(x+k) where n is the array, x is the index of the median, and k is the number of neighbours you need.(maybe k/2, if you want total k, not k each side)

glasnt
This doesn't work. The median of median algorithms DOESNT sort the items. To do so would take O(n log n), whereas median-of-medians works on O(n).
Steve314
Ah, apologies. I read the original question at version 2, where he added that he already had it sorted in order.
glasnt
+1  A: 

Having partitioned your set, finding the greatest element of the "left" half is O(N/2), and finding the least element of the "right" half is also O(N/2).

Repeat until you have all the elements you want. Complexity is O(k*N), which for constant k is O(N). Actually, you can select and order the k/2 greatest elements a bit more efficiently than selecting the greatest element k/2 times, but it's the same complexity as far as N is concerned.

Edit2: Much questioning whether k is constant, or might be any value less than N. OK, if you want to select (without sorting) the k/2 least elements greater than the median: do another quickselect on the right hand side of the partition, this time not partitioning on the median, but instead on the (k/2)th element. Then do the same on the lhs, for the (N/2 - k/2)th element. But actually, if you were going to do this then you never should have selected the median in the first place: you should have first partitioned the original array on the (N/2 + k/2)th element, then partitioned again on the (N/2 - k/2)th element of the left side of that partition. If you want to select and sort k elements around the median in O(N) comparisons, and k can be anything up to N, then obviously that's impossible, because if it was possible then you could sort the whole array in less than O(N log N).

Edit: I may have misinterpreted "nearest neighbours". Do you want the k elements closest to the median by value (i.e. using subtraction), or the k elements closest to the median by sorted position? I've given you the latter, but you can get the former in the same O(N) by for example grabbing and sorting k elements on each side of the median (2*k total) as already described. Then start at the "outsides" and work inwards from both directions, discarding k elements by excluding whichever of the two candidates is furthest from the median by value.

I can't immediately think of a way of doing this nearest-by-value selection in O(N) (without sorting) in the case where k could be anything up to N.

Steve Jessop
I'm not sure if the complexity is O(n) since k is given by the user and it could be equal to n
Anonymous
A: 

First select the median in O(n) time, using a standard algorithm of that complexity. Then run through the list again, selecting the elements that are nearest to the median (by storing the best known candidates and comparing new values against these candidates, just like one would search for a maximum element).

In each step of this additional run through the list O(k) steps are needed, and since k is constant this is O(1). So the total for time needed for the additional run is O(n), as is the total runtime of the full algorithm.

sth
While true that O(k) is O(1) when k is constant, if k -> n then this becomes O(n^2).Also, how do you know k is constant? If it is, then can't n also be considered constant?
Kirk Broadhurst
+2  A: 

The median-of-medians probably doesn't help much in finding the nearest neighbours, at least for large n. True, you have each column of 5 partitioned around it's median, but this isn't enough ordering information to solve the problem.

I'd just treat the median as an intermediate result, and treat the nearest neighbours as a priority queue problem...

Once you have the median from the median-of-medians, keep a note of it's value.

Run the heapify algorithm on all your data - see Wikipedia - Binary Heap. In comparisons, base the result on the difference relative to that saved median value. The highest priority items are those with the lowest ABS(value - median). This takes O(n).

The first item in the array is now the median (or a duplicate of it), and the array has heap structure. Use the heap extract algorithm to pull out as many nearest-neighbours as you need. This is O(k log n) for k nearest neighbours.

So long as k is a constant, you get O(n) median of medians, O(n) heapify and O(log n) extracting, giving O(n) overall.

Steve314
Isn't the complexity of heapify O(nlogn)?
Anonymous
If you do it the dumb way (insert each item in turn into an initially empty heap) it's O(n log n). If you use the heapify algorithm, it is O(n). See the wikipedia page ("Building a heap" section) for more detail.
Steve314
A: 

You already know how to find the median in O(n)

if the order does not matter, selection of k smallest can be done in O(n) apply for k smallest to the rhs of the median and k largest to the lhs of the median

from wikipedia

 function findFirstK(list, left, right, k)
 if right > left
     select pivotIndex between left and right
     pivotNewIndex := partition(list, left, right, pivotIndex)
     if pivotNewIndex > k  // new condition
         findFirstK(list, left, pivotNewIndex-1, k)
     if pivotNewIndex < k
         findFirstK(list, pivotNewIndex+1, right, k)

don't forget the special case where k==n return the original list

gnibbler
A: 

You could use a non-comparison sort, such as a radix sort, on the list of numbers L, then find the k closest neighbors by considering windows of k elements and examining the window endpoints. Another way of stating "find the window" is find i that minimizes abs(L[(n-k)/2+i] - L[n/2]) + abs(L[(n+k)/2+i] - L[n/2]) (if k is odd) or abs(L[(n-k)/2+i] - L[n/2]) + abs(L[(n+k)/2+i+1] - L[n/2]) (if k is even). Combining the cases, abs(L[(n-k)/2+i] - L[n/2]) + abs(L[(n+k)/2+i+!(k&1)] - L[n/2]). A simple, O(k) way of finding the minimum is to start with i=0, then slide to the left or right, but you should be able to find the minimum in O(log(k)).

The expression you minimize comes from transforming L into another list, M, by taking the difference of each element from the median.

m=L[n/2]
M=abs(L-m)

i minimizes M[n/2-k/2+i] + M[n/2+k/2+i].

outis
A: 

Since all the elements are distinct, there can be atmost 2 elements with the same difference from the mean. I think it is easier for me to have 2 arrays A[k] and B[k] the index representing the absolute value of the difference from the mean. Now the task is to just fill up the arrays and choose k elements by reading the first k non empty values of the arrays reading A[i] and B[i] before A[i+1] and B[i+1]. This can be done in O(n) time.

Anonymous
"choose k elements by reading the first k non empty values of the arrays" -- in order to do that, the arrays have to be sorted. Sorting those arrays takes time O(n log n).
Windows programmer
@Windows programmer: only if you're doing a comparison based sort.
outis
This works if the numbers are integers only
Anonymous
+1  A: 

Actually, the answer is pretty simple. All we need to do is to select k elements with the smallest absolute differences from the median moving from m-1 to 0 and m+1 to n-1 when the median is at index m. We select the elements using the same idea we use in merging 2 sorted arrays.

Anonymous
A: 

med=Select(A,1,n,n/2) //finds the median

for i=1 to n B[i]=mod(A[i]-med)

q=Select(B,1,n,k+1)

j=0 for i=1 to n if B[i]<=q C[j]=B[i] j++

return C

Gaurav Pandey