Algorithm for Finding nth smallest/largest element in an array using data strucuture self balancing binary search tree..

Read the post: http://stackoverflow.com/questions/2329171/find-kth-smallest-element-in-a-binary-search-tree-in-optimum-way/2329236#2329236. But the correct answer is not clear, as i am not able to figure out the correct answer, for an example that i took...... Please a bit more explanation required.......


With a loop I guess. (Find the (or a) max, pop the array) i times

+2  A: 

First sort the array descending, then take the ith element.

This has O(N log N) complexity, but the job can be done with O(N) complexity.
Jerry Coffin
Using a selection sort to sort the array, but stopping at the i'th iteration of the outer loop also has O(N) guaranteed complexity, but in practice will be slower than the select algorithm of your answer.
@Jerry: Since nearly all programming languages come with a highly-optimized sort function, the savings in code-complexity is often well worth the extra factor of log N.
Daniel Stutzbach
@GregS: What do you mean? That would be O(N*I) where I is the index of the element.
@Daniel: I suppose it depends on what you want. Personally, I'd just use a language/library that already includes an optimized `select` function.
Jerry Coffin
@sdcwc it would *not* be proportional to n, so i is a constant so the complexity remains O(n)
neal aise
+12  A: 

C.A.R. Hoare's select algorithm is designed for precisely this purpose. It executes in [expected] linear time, with logarithmic extra storage.

Edit: the obvious alternative of sorting, then picking the right element has O(N log N) complexity instead of O(N). Storing the i largest elements in sorted order requires O(i) auxiliary storage, and roughly O(N * i log i) complexity. This can be a win if i is known a priori to be quite small (e.g. 1 or 2). For more general use, select is usually better.

Edit2: offhand, I don't have a good reference for it, but described the idea in a previous answer.

Jerry Coffin
Have a good online reference by chance?
Jamie Wong
The select algorithm is what quicksort is based on. In select you don't have to process both sides of the array. You also don't have to keep processing sub-arrays after the sub-array size gets small enough (about 2*i).
Here's a good reference. It's better known as quickselect: http://en.wikipedia.org/wiki/Selection_algorithm#Partition-based_general_selection_algorithm
Nick Johnson
Also, you can find a good reference in Skiena's Algorithm Design Manual at page 436.

Create a sorted data structure to hold i elements and set the initial count to 0.

Process each element in the source array, adding it to that new structure until the new structure is full.

Then process the rest of the source array. For each one that is larger than the smallest in the sorted data structure, remove the smallest from that structure and put the new one in.

Once you've processed all elements in the source array, your structure will hold the i greatest elements. Just grab the last of these and you have your i'th greatest element.


Alternatively, sort it then just grab the i'th element directly.


That's a fitting task for the heaps which feature very low insert and low delete_min costs. E.g. pairing heaps. It would have the worst case O(n*log(n)) performance. But since non-trivial to implement, better check first suggested elsewhere selection algorithms.


There are many strategies available for your task (if you don't focus on the self-balancing tree to begin with).

It's usually a tradeoff speed / memory. Most algorithms require either to modify the array in place or O(N) additional storage.

The solution with self-balancing tree is in the latter category, but it's not the right choice here. The issue is that building the tree itself takes O(N*log N), which will dominate the later search term and give a final complexity of O(N*log N). Therefore you're not better than simply sorting the array and use a complex datastructure...

In general, the issue largely depends on the magnitude of i related to N. If you think for a minute, for i == 1 it's trivial right ? It's called finding the maximum.

Well, the same strategy obviously work for i == 2 (carrying the 2 maximum elements around) in linear time. And it's also trivially symmetric: ie if you need to find the N-1 th element, then just carry around the 2 minimum elements.

However, it loses efficiency when i is about N/2 or N/4. Carrying the i maximum elements then mean sorting an array of size i... and thus we fallback on the N*log N wall.

Jerry Coffin pointed out a simple solution, which works well for this case. Here is the reference on Wikipedia. The full article also describes the Median of Median method: it's more reliable, but involves more work and is thus generally slower.

Matthieu M.
Create an empty list L

For each element x in the original list,

    add x in sorted position to L
    if L has more than i elements, 
        pop the smallest one off L

if List2 has i elements, 
    return the i-th element, 
    return failure

This should take O(N (log (i))). If i is assumd to be a constant, then it is O(N).

Larry Watanabe
Can you provide one example for thiss...... Looks pretty good one.... but not exactly understood.....