ansaurus

Question

Converting while to generator 3.4 times slow down

Answer 1

A:

The two are not equivalent.

j=i
while j < ls - 1 and len(wordlist[j]) > lc: 
    j+=1

will stop the while loop as soon as wordlist[j] <= lc. It could conceivably go through the loop zero times if the first word in the list is shorter than or equal to lc.

j = next(j for j in range(i,ls) if len(wordlist[j]) <=  lc)

will continue iterating through the whole range i to ls, regardless of the length of the words in the list.

Edit: Ignore the above - as Amber pointed out, the call to next() means that the generator expression is only evaluated up until the first result is returned. In that case I suspect the time difference comes from using range() instead of xrange() (unless this is Python 3.x). In Python 2.x range() will create the full list in memory, even if the generator expression only returns the first value.

Dave Kirby 2010-10-10 22:47:07

Not actually true. Generators are lazily-evaluated, and thus calling `next()` will only grab the first element in the result, which means that the generator won't evaluate anything beyond where the `if` condition is true.

Amber 2010-10-10 22:48:53

@Amber: damn, you are right. I completely overlooked the call the next().

Dave Kirby 2010-10-10 22:53:53

Answer 2

+2 A:

I've found that using generators can often be slower than generating the whole list, which is a little counter-intuitive. I've managed to fix performance bottlenecks just by adding a [] pair.

For example compare these:

$ python -m timeit -n 1000 "' '.join(c for c in 'hello world')"
1000 loops, best of 3: 6.11 usec per loop
$ python -m timeit -n 1000 "' '.join([c for c in 'hello world'])"
1000 loops, best of 3: 3.79 usec per loop

It's almost twice as quick to generate the whole list first rather than use a generator even for such a simple case!

Edit: As Thomas Wouters points out in the comments the reason the generator is slower here is because it's such a simple case. For balance here is his test in which the generator is the clear winner:

$ python -m timeit -s "s = 'hello world' * 10000" -s "class C: pass" "for i in (C() for c in s): pass"
10 loops, best of 3: 33.6 msec per loop
$ python -m timeit -s "s = 'hello world' * 10000" -s "class C: pass" "for i in [C() for c in s]: pass"
10 loops, best of 3: 172 msec per loop

Scott Griffiths 2010-10-11 08:15:19

Yes, a generator has to do a tiny bit of extra work for each element than creating a list and then iterating would. However, whether that is enough to notice depends greatly on how well the full list fits in memory (which is not easy to see just by looking at the code.) In your example, the list is tiny, creating the full list will be fast, and you're really only measuring the speed of the iteration (you don't spend time anywhere else.) Try it with, say, `python -m timeit -s "s = 'hello world' * 10000" "' '.join(c for c in s)` instead and you'll see the generator can be quite faster.

Thomas Wouters 2010-10-11 11:02:15

@Thomas: Good points, but I still get the generator being slower for your example (11 ms vs. 8 ms), and increasing the string length further doesn't change this.

Scott Griffiths 2010-10-11 11:12:43

Although you may want to change the loop to `c for c in s if 0` to reduce the noise of creating the result string :)

Thomas Wouters 2010-10-11 11:13:53

Yeah, I forgot about the optimizations involved here, especially string interning. The difference won't be easily noticeable with this minimal amount of work; you need to grow the list *itself* to beyond what fits into memory, the strings don't take up any extra memory. Using something other than a string may also show a better difference.

Thomas Wouters 2010-10-11 11:16:04

Here's a version that shows the difference when not just caching strings: http://paste.pocoo.org/show/273935/

Thomas Wouters 2010-10-11 11:20:34

ansaurus

tags:

views:

answers:

Converting while to generator 3.4 times slow down

related questions