I need to get the lesser n numbers of a list in Python. I need this to be really fast because it's in a critical part for performance and it needs to be repeated a lot of times.
n is usually no greater than 10 and the list usually has around 20000 elements. The list is always different each time I call the function. Sorting can't be made in place.
Initially, I have written this function:
def mins(items, n):
mins = [float('inf')]*n
for item in items:
for i, min in enumerate(mins):
if item < min:
mins.insert(i, item)
mins.pop()
break
return mins
But this function can't beat a simple sorted(items)[:n] which sort the entire list. Here is my test:
from random import randint, random
import time
test_data = [randint(10, 50) + random() for i in range(20000)]
init = time.time()
mins = mins(test_data, 8)
print 'mins(items, n):', time.time() - init
init = time.time()
mins = sorted(test_data)[:8]
print 'sorted(items)[:n]:', time.time() - init
Results:
mins(items, n): 0.0632939338684
sorted(items)[:n]: 0.0231449604034
sorted()[:n] is three times faster. I believe this is because:
- insert() operation is costly because Python lists are not linked lists.
- sorted() is an optimized c function and mine is pure python.
Is there any way to beat sorted()[:n] ? Should I use a C extension, or Pyrex or Psyco or something like that?
Thanks in advance for your answers.