views:

178

answers:

2

What is happening? Can somebody explain me what happens here, I changed in tight loop:

##            j=i
##            while j < ls - 1 and len(wordlist[j]) > lc: j+=1
            j = next(j for j in range(i,ls) if len(wordlist[j]) <=  lc)

The commented while version ran the whole program: 625 ms, the next generator version ran the whole program in time of 2.125 s.

What can be reason that this more pythonic version cause such a catastroph in performance?

EDIT: Maybe it is caused by use of psyco module? Surely at least the running time with Python 2.7 which has not psyco, was 2.141 for the next version, means almost same as Python 2.6 with psyco.

After removing the *.pyc files I got notthe code to slow down. Then when I removed import of psyco from library module also, I got 2.6 timing also for use without psyco, results for the non psyco version and psyco version (as now the library routine slows down also and it's timing is also relevant:)

not psyco:

  1. while: preparation in library: 532 ms, total running time 2.625 s
  2. next: preparation in library: 532 ms, total running time (time.clock()): 2.844 s (version with xrange same wall time)

psyco:

  1. while: preparation in library: 297 ms, total running time : 609..675 ms
  2. next: preparation in library: 297 ms, total running time: 1.922 s (version with range instead of xrange everywhere in program: 1.985 s)

Running in WindowsXP AMD Sempron 3100+ system with 2GB RAM. Counting the loops and calls with two globals:

    j=i
    callcount += 1
    while j < ls - 1 and len(wordlist[j]) > lc:
        j+=1
        loopcount += 1

Result for the test input with psyco:

Finished in 625 ms
Loopcount: 78317
Callcount: 47970
Ration: 1.633

So the loop is inside tight loop, but is on average executed only couple of times (notice that two increments of global counters did not slow down the code in psyco)

CONCLUSIONS: Despite the highly sensitive nature of the algorithm relative to vocabulary length, which caused me to pass some imposible words from consideration by this loop, later the basic cases of recursion are checked by dictionary lookup which is O(n), therefore the highly beneficial earlier optimization is become not very beneficial, even with longer input and moving the callcount counter in beginning of the function, showed that call count is not affected by the vocabulary length, but outer loop count is slichtly reduced (the code originally posted is in elif part of if statement).

Longer run times (29 372 solutions) with while loop and the whole loop removed (using i instead of j) (library preparation 312 ms):

  1. Without the loop: elif branch count: 485488, outerloopcount: 10129147, ratio: 0,048, runtime 6,000 s (without counters: 4,594 s)
  2. With the loop: loopcount: 19355114, outercount: 8194033, ratio: 0,236, runtime 5,704 s (without counters: 4,688 s)

(running time without loop, counters and psyco: 32,792 s, library 608 ms)

So without the extra counters the benefit of this loop using psyco is in the harder case: (4688-4594)*100/4688.0 % = 2 %

This inspired me to reverse another earlier optimization, which I had wondered about in DaniWeb. Earlier version of code run faster, when the smallest word size was global, not paramerter. According to documentation, local variable calls are faster, but apparantly the cost in making recursion heavier outweighted that. Now in the harder case this other reversal of optimization brought more expected performance behaviour in the case of no optimization of word lenght: the run time with psyco was 312 ms preparations, 4,469..4,484 s total running time. So this made code cleaner and brought more benefit in this case as the removed loop had. And putting the parameter to version with while loop, did not change the running time much (the variation became greater for library preparation code)

**What I learned from this: If you do n optimizations for speed 
you must check the first n-1 optimizations after doing nth one**
A: 

The two are not equivalent.

j=i
while j < ls - 1 and len(wordlist[j]) > lc: 
    j+=1

will stop the while loop as soon as wordlist[j] <= lc. It could conceivably go through the loop zero times if the first word in the list is shorter than or equal to lc.

j = next(j for j in range(i,ls) if len(wordlist[j]) <=  lc)

will continue iterating through the whole range i to ls, regardless of the length of the words in the list.

Edit: Ignore the above - as Amber pointed out, the call to next() means that the generator expression is only evaluated up until the first result is returned. In that case I suspect the time difference comes from using range() instead of xrange() (unless this is Python 3.x). In Python 2.x range() will create the full list in memory, even if the generator expression only returns the first value.

Dave Kirby
Not actually true. Generators are lazily-evaluated, and thus calling `next()` will only grab the first element in the result, which means that the generator won't evaluate anything beyond where the `if` condition is true.
Amber
@Amber: damn, you are right. I completely overlooked the call the next().
Dave Kirby
+2  A: 

I've found that using generators can often be slower than generating the whole list, which is a little counter-intuitive. I've managed to fix performance bottlenecks just by adding a [] pair.

For example compare these:

$ python -m timeit -n 1000 "' '.join(c for c in 'hello world')"
1000 loops, best of 3: 6.11 usec per loop
$ python -m timeit -n 1000 "' '.join([c for c in 'hello world'])"
1000 loops, best of 3: 3.79 usec per loop

It's almost twice as quick to generate the whole list first rather than use a generator even for such a simple case!

Edit: As Thomas Wouters points out in the comments the reason the generator is slower here is because it's such a simple case. For balance here is his test in which the generator is the clear winner:

$ python -m timeit -s "s = 'hello world' * 10000" -s "class C: pass" "for i in (C() for c in s): pass"
10 loops, best of 3: 33.6 msec per loop
$ python -m timeit -s "s = 'hello world' * 10000" -s "class C: pass" "for i in [C() for c in s]: pass"
10 loops, best of 3: 172 msec per loop
Scott Griffiths
Yes, a generator has to do a tiny bit of extra work for each element than creating a list and then iterating would. However, whether that is enough to notice depends greatly on how well the full list fits in memory (which is not easy to see just by looking at the code.) In your example, the list is tiny, creating the full list will be fast, and you're really only measuring the speed of the iteration (you don't spend time anywhere else.) Try it with, say, `python -m timeit -s "s = 'hello world' * 10000" "' '.join(c for c in s)` instead and you'll see the generator can be quite faster.
Thomas Wouters
@Thomas: Good points, but I still get the generator being slower for your example (11 ms vs. 8 ms), and increasing the string length further doesn't change this.
Scott Griffiths
Although you may want to change the loop to `c for c in s if 0` to reduce the noise of creating the result string :)
Thomas Wouters
Yeah, I forgot about the optimizations involved here, especially string interning. The difference won't be easily noticeable with this minimal amount of work; you need to grow the list *itself* to beyond what fits into memory, the strings don't take up any extra memory. Using something other than a string may also show a better difference.
Thomas Wouters
Here's a version that shows the difference when not just caching strings: http://paste.pocoo.org/show/273935/
Thomas Wouters