views:

187

answers:

6

Hi! What´s the most efficient, elegant and pythonic way of solving this problem?

Given a list (or set or whatever) of n elements, we want to get the k biggest ones. ( You can assume k<n/2 without loss of generality, I guess) For example, if the list were:

l = [9,1,6,4,2,8,3,7,5]

n = 9, and let's say k = 3. What's the most efficient algorithm for retrieving the 3 biggest ones? In this case we should get [9,8,7], in no particular order.

Thanks! Manuel

+6  A: 
l = [9,1,6,4,2,8,3,7,5]

sorted(l)[-k:]
mshsayem
That is nicer than mine)
Rorick
… but it does not work in the very specific case of k==0. :)
EOL
+22  A: 

Use nlargest from heapq module

from heapq import nlargest
lst = [9,1,6,4,2,8,3,7,5]
nlargest(3, lst) # Gives [9,8,7]

You can also give a key to nlargest in case you wanna change your criteria:

from heapq import nlargest
tags = [ ("python", 30), ("ruby", 25), ("c++", 50), ("lisp", 20) ]
nlargest(2, tags, key=lambda e:e[1]) # Gives [ ("c++", 50), ("python", 30) ]
sharjeel
+1: I didn know abt this module... thanks
mshsayem
+3  A: 

You can use the heapq module.

>>> from heapq import heapify, nlargest
>>> l = [9,1,6,4,2,8,3,7,5]
>>> heapify(l)
>>> nlargest(3, l)
[9, 8, 7]
>>> 
sykora
+3  A: 
sorted(l, reverse=True)[:k]
Rorick
+4  A: 

The simple, O(n log n) way is to sort the list then get the last k elements.

The proper way is to use a selection algorithm, which runs in O(n + k log k) time.

Also, heapq.nlargest takes O(n log k) time, which may or may not be good enough.

(If k = O(n), then all 3 algorithms have the same complexity (i.e. don't bother). If k = O(log n), then the selection algorithm as described in Wikipedia is O(n) and heapq.nlargest is O(n log log n), but double logarithm is "constant enough" for most practical n that it doesn't matter.)

KennyTM
A: 

WARNING: NO GOOD FOR PERFORMANCE.
I am learning functional programming these days! :-)

Recursive function to get k max elements.

def rec_get_k_max(lst,k):
    if k == 0 or len(lst)==0 :
        return []
    else:
        m = max(lst)
        lst.remove(m)
        mx_lst = [m] + rec_get_k_max(lst,k-1)
        return mx_lst

if __name__=='__main__':
    ll = [9,1,6,4,2,8,3,7,5]
    print rec_get_k_max(ll,3)
TheMachineCharmer
what is `a`? `rec_get` shouldn't be `rec_get_k_max`?
fortran
isn't this a tail recursion ?
Adrien Plisson
@fortran `a` is replaced with `m`! Thanks!
TheMachineCharmer
The function call in main is still undefined. If you're learning functional programming, you should mind about not doing destructive operations on your parameters...
fortran
What you are talking about is `purely functional programming`! Thanks anyways! Its okay to modify data in `not so purely functional programming`!
TheMachineCharmer
@Adrien After reading wikipedia article on tail recursion I think it IS tail recursive. Thanks for introducing me to tail recursion.
TheMachineCharmer
@The Machine Charmer: i am glad you learned something, that was the whole point of my comment...
Adrien Plisson