ansaurus

Question

Fast iterating over first n items of an iterable (not a list) in python

Answer 1

+1 A:

Of a list? Try

for k in mylist[0:n]:
     # do stuff with k

you can also use a comprehension if you need to

my_new_list = [blah(k) for k in mylist[0:n]]

Arrieta 2010-04-23 21:49:52

yes, i got it. wrong title, sorry, my bad.

martinthenext 2010-04-23 21:52:55

Answer 2

+3 A:

If it's a list then you can use slicing:

list[:n]

Mark Byers 2010-04-23 21:50:00

Answer 3

+5 A:

for item in itertools.islice(iterable, n): is the most obvious, easy way to do it. It works for arbitrary iterables and is O(n), like would be any sane solution.

It's conceivable that another solution could have better performance; we wouldn't know without timing. I wouldn't recommend bothering with timing unless you profile your code and find this call to be a hotspot. Unless it's buries within an inner loop, it is highly doubtful that it will be. Premature optimization is the root of all evil.

If I was going to look for alternate solutions, I would look at ones like for count, item in enumerate(iterable): if count > n: break ... and for i in xrange(n): item = next(iterator) .... I wouldn't guess these would help, but they seem to be worth trying if we really want to compare things. If I was stuck in a situation where I profiled and found this was a hotspot in an inner loop (is this really your situation?), I would also try to ease the name lookup from getting the islice attribute of the global iterools to binding the function to a local name already.

These are things you only do after you've proven they'll help. People try doing them other times a lot. It doens't help make their programs appreciably faster; it just makes their programs worse.

Mike Graham 2010-04-23 22:00:43

Thanks a lot for the answer, it helped me a lot!

martinthenext 2010-04-23 22:05:12

Well, uning enumerate looks quite good to me too! As for profiling and finding hotspots, this is not actually my case, I just expect some loops in my code to have enormous iteration counts, that is why I've asked a question. Now I get it - this was a mistake to try optimization on this stage, I've got to finish the code and test it, and only then optimize things, if needed. Thanks again for your help.

martinthenext 2010-04-23 22:27:31

Answer 4

+5 A:

itertools tends to be the fastest solution, when directly applicable.

Obviously, the only way to check is to benchmark -- e.g., save in aaa.py

import itertools

def doit1(iterable, n, do_something=lambda x: None):
  count = 0
  for item in iterable:
   do_something(item)
   count += 1
   if count >= n: break

def doit2(iterable, n, do_something=lambda x: None):
  for item in itertools.islice(iterable, n):
      do_something(item)

pair_generator = lambda iterable: itertools.izip(*[iter(iterable)]*2)

def dd1(itrbl=range(44)): doit1(itrbl, 23)
def dd2(itrbl=range(44)): doit2(itrbl, 23)

and see...:

$ python -mtimeit -s'import aaa' 'aaa.dd1()'
100000 loops, best of 3: 8.82 usec per loop
$ python -mtimeit -s'import aaa' 'aaa.dd2()'
100000 loops, best of 3: 6.33 usec per loop

so clearly, itertools is faster here -- benchmark with your own data to verify.

BTW, I find timeit MUCH more usable from the command line, so that's how I always use it -- it then runs the right "order of magnitude" of loops for the kind of speeds you're specifically trying to measure, be those 10, 100, 1000, and so on -- here, to distinguish a microsecond and a half of difference, a hundred thousand loops is about right.

Alex Martelli 2010-04-23 22:03:08

weird, it's just against my cplusplus'ish intuition to see a simple solution run slower than a neat one. python is the coolest language, indeed. this is a great addition to Mike Graham's advice not to do premature optimization. i guess the general rule is to write what's neat, not thinking about running time.

martinthenext 2010-04-23 22:15:20

@martin, personally, I think a _lot_ about running time (mostly in terms of big-O for scalability) -- but, in general, the most Pythonic idioms are often going to be the ones that have been most optimized, because they're usually the ones us Python committers tend to care most about (Hettinger, the author of itertools and other speedy parts of Python, has been quite active in that field in recent years, as was Peters in earlier years, but it's a pretty general phenomenon in the Python-committer community).

Alex Martelli 2010-04-24 02:26:08

Answer 5

+2 A:

You can use enumerate to write essentially the same loop you have, but in a more simple, Pythonic way:

for idx, val in enumerate(iterableobj):
    if idx > n:
        break
    do_something(val)

Michael Aaron Safyan 2010-04-23 22:12:38

This option has been discussed above, looks like a good option, but I think `islice` is better because it doesn't require any additional variables in the loop body, making it look clearer to me

martinthenext 2010-04-23 22:37:45

ansaurus

tags:

views:

answers:

Fast iterating over first n items of an iterable (not a list) in python

related questions