views:

31

answers:

2

Hi everyone,

How do I return the index in the original list of the nth largest items of an iterable

heapq.nlargest(2, [100, 2, 400, 500, 400])

output = [(3,500), (2, 400)]

This already cost me a couple hours. I can't figure it out.

+1  A: 

You can use list.index in combination with map, which is fast for small n (beware the list.index returns the index in the list of the first item whose value is x):

>>> iterable = [100, 2, 400, 500, 400]
>>> map(iterable.index, heapq.nlargest(2, iterable))
[3, 2]

To see the associated values ...

>>> map(lambda n: (n, iterable.index(n)), heapq.nlargest(2, iterable))
[(500, 3), (400, 2)]

For larger n see @SilentGhost's post.


Edit: Benchmarked some solution:

#!/usr/bin/env python
import heapq
from timeit import Timer

seq = [100, 2, 400, 500, 400]

def a(seq):
    """returns [(3, 500), (2, 400)]"""
    return heapq.nlargest(2, enumerate(seq), key=lambda x: x[1])

def b(seq):
    """returns [3, 2]"""
    return map(seq.index, heapq.nlargest(2, seq))

def c(seq):
    """returns [(500, 3), (400, 2)]"""
    map(lambda n: (n, seq.index(n)), heapq.nlargest(2, seq))

if __name__ == '__main__':
    _a = Timer("a(seq)", "from __main__ import a, seq")
    _b = Timer("b(seq)", "from __main__ import b, seq")
    _c = Timer("c(seq)", "from __main__ import c, seq") 

    loops = 1000000

    print _a.timeit(number=loops)
    print _b.timeit(number=loops)
    print _c.timeit(number=loops)

    # Core i5, 2.4GHz, Python 2.6, Darwin
    # 8.92712688446
    # 5.64332985878
    # 6.50824809074
The MYYN
Great, Thanks a lot, It works
Joey
that's not only very inefficient, it barely works
SilentGhost
@SilentGhost, please explain. At least in a simple benchmark `iterable.index` seems nearly twice as fast (see my edit).
The MYYN
@Roque: what do your benchmark worth if you're comparing solution that works (mine) with solutions that don't (yours)? Despite linking to the `index` docs you haven't probably noticed very important bit there: "Return the index in the list of the **first** item [...]"
SilentGhost
@Roque: well, sure, when `n = 2` you see what you see, how does your "solution" performs with `n = 10`?
SilentGhost
`n = 10` yields the same performance results. for an array of length 10000 and `n = 100` you win, congrats ;)
The MYYN
see also: http://gist.github.com/457086
The MYYN
+3  A: 
>>> seq = [100, 2, 400, 500, 400]
>>> heapq.nlargest(2, enumerate(seq), key=lambda x: x[1])
[(3, 500), (2, 400)]
SilentGhost