ansaurus

Question

Finding Nth item of unsorted list without sorting the list

Answer 1

+1 A:

Use heapsort. It only partially orders the list until you draw the elements out.

UncleO 2009-06-23 20:07:22

Try to find the n/2-th element - Requires O(nlogn)!

Dario 2009-06-23 20:13:42

Answer 2

+3 A:

You can iterate the entire sequence maintaining a list of the 5 largest values you find (this will be O(n)). That being said I think it would just be simpler to sort the list.

Andrew Hare 2009-06-23 20:07:52

But when it isn't the fifth but the nth element, you'll have O(n²) which is even worse than sorting.

Dario 2009-06-23 20:10:12

I suppose you mean to maintain a list of the N largest values. But N cannot be too large in that case.

Bruno Ranschaert 2009-06-23 20:11:01

Answer 3

+1 A:

You essentially want to produce a "top-N" list and select the one at the end of that list.

So you can scan the array once and insert into an empty list when the largeArray item is greater than the last item of your top-N list, then drop the last item.

After you finish scanning, pick the last item in your top-N list.

An example for ints and N = 5:

int[] top5 = new int[5]();
top5[0] = top5[1] = top5[2] = top5[3] = top5[4] = 0x80000000; // or your min value

for(int i = 0; i < largeArray.length; i++) {
    if(largeArray[i] > top5[4]) {
       // insert into top5:
       top5[4] = largeArray[i];

       // resort:
       quickSort(top5);
    }
}

Jeff Meatball Yang 2009-06-23 20:09:33

Answer 4

+8 A:

Sorting would require O(nlogn) runtime at minimum - There are very efficient selection algorithms which can solve your problem in linear time.

Partition-based selection (sometimes Quick select), which is based on the idea of quicksort (recursive partitioning), is a good solution (see link for pseudocode + Another example).

Dario 2009-06-23 20:12:07

Nice link. I believe this is the best.

Jeff Meatball Yang 2009-06-23 20:26:02

Answer 5

+1 A:

As people have said, you can walk the list once keeping track of K largest values. If K is large this algorithm will be close to O(n²).

However, you can store your Kth largest values as a binary tree and the operation becomes O(n log k).

According to Wikipedia, this is the best selection algorithm:

 function findFirstK(list, left, right, k)
     if right > left
         select pivotIndex between left and right
         pivotNewIndex := partition(list, left, right, pivotIndex)
         if pivotNewIndex > k  // new condition
             findFirstK(list, left, pivotNewIndex-1, k)
         if pivotNewIndex < k
             findFirstK(list, pivotNewIndex+1, right, k)

Its complexity is O(n)

Unknown 2009-06-23 20:18:55

I believe the Tournament Algorithm, see Dario's links, is what you are shooting for. It has an operation of O(n + k*log(n)).

tgray 2009-06-23 20:43:54

My mistake, though I would be interested to see a full implementation of this in Python.

tgray 2009-06-24 11:44:26

Answer 6

+3 A:

A simple modified quicksort works very well in practice. It has average running time proportional to N (though worst case bad luck running time is O(N^2)).

Proceed like a quicksort. Pick a pivot value randomly, then stream through your values and see if they are above or below that pivot value and put them into two bins based on that comparison. In quicksort you'd then recursively sort each of those two bins. But for the N-th highest value computation, you only need to sort ONE of the bins.. the population of each bin tells you which bin holds your n-th highest value. So for example if you want the 125th highest value, and you sort into two bins which have 75 in the "high" bin and 150 in the "low" bin, you can ignore the high bin and just proceed to finding the 125-75=50th highest value in the low bin alone.

SPWorley 2009-06-23 20:23:03

Answer 7

+4 A:

A heap is the best data structure for this operation and Python has an excellent built-in library to do just this, called heapq.

import heapq

def nth_largest(n, iter):
    return heapq.nlargest(n, iter)[-1]

Example Usage:

>>> import random
>>> iter = [random.randint(0,1000) for i in range(100)]
>>> n = 10
>>> nth_largest(n, iter)
920

Confirm result by sorting:

>>> list(sorted(iter))[-10]
920

FogleBird 2009-06-24 03:21:47

Answer 8

A:

You could try the Median of Medians method - it's speed is O(N).

2010-01-25 03:07:17

ansaurus

tags:

views:

answers:

Finding Nth item of unsorted list without sorting the list

related questions