views:

11770

answers:

15

I believe there's a way to find the kth largest element in an unsorted array of length n in O(n). Or perhaps it's "expected" O(n) or something. How can we do this?

Cheers!

p.s. this is not for homework.

A: 

iterate through the list. if the current value is larger than the stored largest value, store it as the largest value and bump the 1-4 down and 5 drops off the list. If not,compare it to number 2 and do the same thing. Repeat, checking it against all 5 stored values. this should do it in O(n)

Kevin
That "bump" is O(n) if you're using an array, or down to O(log n) (I think) if you use a better structure.
Just Some Guy
It needn't be O(log k) - if the list is a linked list then adding the new element to the top and dropping the last element is more like O(2)
Alnitak
The bump would be O(k) for an array-backed list, O(1) for an appropriately-linked list. Either way, this sort of question generally assumes it to be of minimal impact compared to n and it introduces no more factors of n.
bobince
it would also be O(1) if the bump uses a ring-buffer
Alnitak
Anyhow, the comment's algorithm is incomplete, it fails to consider an element of n coming in which is the new (eg) second-largest. Worst case behaviour, where each element in n must be compared with each in the highscore table, is O(kn) - but that still probably means O(n) in terms of the question.
bobince
An algorithm given with time O(kn) actually has a worst case of O(n^2) where k=n. Although in that case it would be faster to look for the smallest item. The algorithm could always be reversed in the case where k>n/2 to look for the kth smallest item.
Elie
+3  A: 

A quick Google on that ('kth largest element array') returned this: http://discuss.joelonsoftware.com/default.asp?interview.11.509587.17

"Make one pass through tracking the three largest values so far." (it was specifically for 3d largest)

and..

Build a heap/priority queue.  O(n)
Pop top element.  O(log n)
Pop top element.  O(log n)
Pop top element.  O(log n)

Total = O(n) + 3 O(log n) = O(n)
warren
well, its actually O(n)+ O( k log n) which doesn't reduce for significant values of K
Jimmy
right - Big-O is all about approximations :)
warren
also note: I quoted the site :)
warren
Tracking can be done with a doubly linked list that you keep at fixed length. The last item should then be the kth largest element. Insertion at the end and removal at the back are both O(1), lookup at the back is O(1) too.
Jasper Bekkers
But finding the insertion point in that doubly-linked list is O(k).
Just Some Guy
And if k is fixed, O(k) = O(1)
Tyler McHenry
@warren: Big-O is approximating, but you always over-approximate. Quicksort is actually O(n^2), for example, since that is the worst case. this one is O(n + k log n).
Claudiu
+2  A: 

A Programmer's Companion to Algorithm Analysis gives a version that is O(n), although the author states that the constant factor is so high, you'd probably prefer the naive sort-the-list-then-select method.

I answered the letter of your question :)

Jimmy
A: 

You can do it in O(n + kn) = O(n) (for constant k) for time and O(k) for space, by keeping track of the k largest elements you've seen.

For each element in the array you can scan the list of k largest and replace the smallest element with the new one if it is bigger.

Warren's priority heap solution is neater though.

Rob Walker
This would have a worst case of O(n^2) where you're asked for the smallest item.
Elie
"Smallest item" means that k=n, so k is no longer constant.
Tyler McHenry
A: 

What I would do is this:

initialize empty doubly linked list l
for each element e in array
    if e larger than head(l)
        make e the new head of l
        if size(l) > k
            remove last element from l

the last element of l should now be the kth largest element

You can simply store pointers to the first and last element in the linked list. They only change when updates to the list are made.

Update:

initialize empty sorted tree l
for each element e in array
    if e between head(l) and tail(l)
        insert e into l // O(log k)
        if size(l) > k
            remove last element from l

the last element of l should now be the kth largest element
Jasper Bekkers
What if e is smaller than head(l)? It could still be larger than the kth largest element, but would never get added to that list. You will need to sort the list of items in order for this to work, in ascending order.
Elie
You are right, guess I'll need to think this through some more. :-)
Jasper Bekkers
The solution would be to check if e is between head(l) and tail(l) and insert it at the correct position if it is. Making this O(kn). You could make it O(n log k) when using a binary tree that keeps track of the min and max elements.
Jasper Bekkers
+8  A: 

The keywords you are looking for are selection algorithm: Wikipedia lists a number of different ways of doing this.

Adam Rosenfield
+6  A: 

This is called finding the k-th order statistic. There's a very simple randomized algorithm taking O(n) time, and a pretty complicated non-randomized algorithm taking O(n) time. There's some info in wikipedia but it's not very good. Everything you need is in these powerpoint slides. Also it's very nicely detailed in the book by Cormen et al (Introduction to Algorithms).

eladv
+1  A: 

Read Chapter 9, Medians and Other statistics from Cormen's "Introduction to Algorithms", 2nd Ed. It has an expected linear time algorithm for selection. It's not something that people would randomly come up with in a few minutes.. A heap sort, btw, won't work in O(n), it's O(nlgn).

+2  A: 

The C++ standard library has almost exactly that function, although it does modify your data. It has expected linear run-time, O(N), and it also does a partial sort.

const int N = ...;
double a[N];
// ... 
const int m = ...; // m < N
nth_element (a, a + m, a + N);
// a[m] contains the mth element in a
David Nehme
No, it has an expected *average* O(n) runtime. For example, quicksort is O(nlogn) on average with a worst case of O(n^2). Wow, something straight up factually wrong!
Just Some Guy
No, there's nothing factually wrong with this answer. It works and the C++ standard requires an expected linear run time.
David Nehme
+3  A: 

If you want a true O(n) algorithm, as opposed to O(kn) or something like that, then you should use quickselect (it's basically quicksort where you throw out the partition that you're not interested in). My prof has a great writeup, with the runtime analysis:

http://pine.cs.yale.edu/pinewiki/QuickSelect

Ying Xiao
+1, very detailed explanation indeed
Matthieu M.
+1, Jim Aspnes FTW
viksit
A: 

i would like to suggest one answer

if we take the first k elements and sort them into a linked list of k values

now for every other value even for the worst case if we do insertion sort for rest n-k values even in the worst case number of comparisons will be k*(n-k) and for prev k values to be sorted let it be k*(k-1) so it comes out to be (nk-k) which is o(n)

cheers

sorting takes nlogn time... the algorithm should run in linear time
MrDatabase
+1  A: 

Find the median of the array in linear time, then use partition procedure exactly as in quicksort to divide the array in two parts, values to the left of the median lesser( < ) than than median and to the right greater than ( > ) median, that too can be done in lineat time, now, go to that part of the array where kth element lies, Now recurrence becomes: T(n) = T(n/2) + cn which gives me O (n) overal.

pranjal
+1  A: 

You do like quicksort. Pick an element at random and shove everything either higher or lower. At this point you'll know which element you actually picked, and if it is the kth element you're done, otherwise you repeat with the bin (higher or lower), that the kth element would fall in. Statistically speaking, the time it takes to find the kth element grows with n, O(n).

stinky
+3  A: 

This is a problem of "Order statistics". A good link describing all possible solutions along with code and output is at below link.

http://www.rawkam.com/?p=870

sunil