views:

199

answers:

5

Hello!

I'm looking for a pythonic way of iterating over first n items of an iterable (upd: not a list in a common case, as for lists things are trivial), and it's quite important to do this as fast as possible. This is how I do it now:

count = 0
for item in iterable:
 do_something(item)
 count += 1
 if count >= n: break

Doesn't seem neat to me. Another way of doing this is:

for item in itertools.islice(iterable, n):
    do_something(item)

This looks good, the question is it fast enough to use with some generator(s)? For example:

pair_generator = lambda iterable: itertools.izip(*[iter(iterable)]*2)
for item in itertools.islice(pair_generator(iterable), n):
 so_something(item)

Will it run fast enough as compared to the first method? Is there some easier way to do it?

+1  A: 

Of a list? Try

for k in mylist[0:n]:
     # do stuff with k

you can also use a comprehension if you need to

my_new_list = [blah(k) for k in mylist[0:n]]
Arrieta
yes, i got it. wrong title, sorry, my bad.
martinthenext
+3  A: 

If it's a list then you can use slicing:

list[:n]
Mark Byers
+5  A: 

for item in itertools.islice(iterable, n): is the most obvious, easy way to do it. It works for arbitrary iterables and is O(n), like would be any sane solution.

It's conceivable that another solution could have better performance; we wouldn't know without timing. I wouldn't recommend bothering with timing unless you profile your code and find this call to be a hotspot. Unless it's buries within an inner loop, it is highly doubtful that it will be. Premature optimization is the root of all evil.


If I was going to look for alternate solutions, I would look at ones like for count, item in enumerate(iterable): if count > n: break ... and for i in xrange(n): item = next(iterator) .... I wouldn't guess these would help, but they seem to be worth trying if we really want to compare things. If I was stuck in a situation where I profiled and found this was a hotspot in an inner loop (is this really your situation?), I would also try to ease the name lookup from getting the islice attribute of the global iterools to binding the function to a local name already.

These are things you only do after you've proven they'll help. People try doing them other times a lot. It doens't help make their programs appreciably faster; it just makes their programs worse.

Mike Graham
Thanks a lot for the answer, it helped me a lot!
martinthenext
Well, uning enumerate looks quite good to me too! As for profiling and finding hotspots, this is not actually my case, I just expect some loops in my code to have enormous iteration counts, that is why I've asked a question. Now I get it - this was a mistake to try optimization on this stage, I've got to finish the code and test it, and only then optimize things, if needed. Thanks again for your help.
martinthenext
+5  A: 

itertools tends to be the fastest solution, when directly applicable.

Obviously, the only way to check is to benchmark -- e.g., save in aaa.py

import itertools

def doit1(iterable, n, do_something=lambda x: None):
  count = 0
  for item in iterable:
   do_something(item)
   count += 1
   if count >= n: break

def doit2(iterable, n, do_something=lambda x: None):
  for item in itertools.islice(iterable, n):
      do_something(item)

pair_generator = lambda iterable: itertools.izip(*[iter(iterable)]*2)

def dd1(itrbl=range(44)): doit1(itrbl, 23)
def dd2(itrbl=range(44)): doit2(itrbl, 23)

and see...:

$ python -mtimeit -s'import aaa' 'aaa.dd1()'
100000 loops, best of 3: 8.82 usec per loop
$ python -mtimeit -s'import aaa' 'aaa.dd2()'
100000 loops, best of 3: 6.33 usec per loop

so clearly, itertools is faster here -- benchmark with your own data to verify.

BTW, I find timeit MUCH more usable from the command line, so that's how I always use it -- it then runs the right "order of magnitude" of loops for the kind of speeds you're specifically trying to measure, be those 10, 100, 1000, and so on -- here, to distinguish a microsecond and a half of difference, a hundred thousand loops is about right.

Alex Martelli
weird, it's just against my cplusplus'ish intuition to see a simple solution run slower than a neat one. python is the coolest language, indeed. this is a great addition to Mike Graham's advice not to do premature optimization. i guess the general rule is to write what's neat, not thinking about running time.
martinthenext
@martin, personally, I think a _lot_ about running time (mostly in terms of big-O for scalability) -- but, in general, the most Pythonic idioms are often going to be the ones that have been most optimized, because they're usually the ones us Python committers tend to care most about (Hettinger, the author of itertools and other speedy parts of Python, has been quite active in that field in recent years, as was Peters in earlier years, but it's a pretty general phenomenon in the Python-committer community).
Alex Martelli
+2  A: 

You can use enumerate to write essentially the same loop you have, but in a more simple, Pythonic way:

for idx, val in enumerate(iterableobj):
    if idx > n:
        break
    do_something(val)
Michael Aaron Safyan
This option has been discussed above, looks like a good option, but I think `islice` is better because it doesn't require any additional variables in the loop body, making it look clearer to me
martinthenext