What is happening? Can somebody explain me what happens here, I changed in tight loop:
## j=i
## while j < ls - 1 and len(wordlist[j]) > lc: j+=1
j = next(j for j in range(i,ls) if len(wordlist[j]) <= lc)
The commented while version ran the whole program: 625 ms, the next generator version ran the whole program in time of 2.125 s.
What can be reason that this more pythonic version cause such a catastroph in performance?
EDIT: Maybe it is caused by use of psyco module? Surely at least the running time with Python 2.7 which has not psyco, was 2.141 for the next version, means almost same as Python 2.6 with psyco.
After removing the *.pyc files I got notthe code to slow down. Then when I removed import of psyco from library module also, I got 2.6 timing also for use without psyco, results for the non psyco version and psyco version (as now the library routine slows down also and it's timing is also relevant:)
not psyco:
- while: preparation in library: 532 ms, total running time 2.625 s
- next: preparation in library: 532 ms, total running time (time.clock()): 2.844 s (version with xrange same wall time)
psyco:
- while: preparation in library: 297 ms, total running time : 609..675 ms
- next: preparation in library: 297 ms, total running time: 1.922 s (version with range instead of xrange everywhere in program: 1.985 s)
Running in WindowsXP AMD Sempron 3100+ system with 2GB RAM. Counting the loops and calls with two globals:
j=i
callcount += 1
while j < ls - 1 and len(wordlist[j]) > lc:
j+=1
loopcount += 1
Result for the test input with psyco:
Finished in 625 ms
Loopcount: 78317
Callcount: 47970
Ration: 1.633
So the loop is inside tight loop, but is on average executed only couple of times (notice that two increments of global counters did not slow down the code in psyco)
CONCLUSIONS: Despite the highly sensitive nature of the algorithm relative to vocabulary length, which caused me to pass some imposible words from consideration by this loop, later the basic cases of recursion are checked by dictionary lookup which is O(n), therefore the highly beneficial earlier optimization is become not very beneficial, even with longer input and moving the callcount counter in beginning of the function, showed that call count is not affected by the vocabulary length, but outer loop count is slichtly reduced (the code originally posted is in elif part of if statement).
Longer run times (29 372 solutions) with while loop and the whole loop removed (using i instead of j) (library preparation 312 ms):
- Without the loop: elif branch count: 485488, outerloopcount: 10129147, ratio: 0,048, runtime 6,000 s (without counters: 4,594 s)
- With the loop: loopcount: 19355114, outercount: 8194033, ratio: 0,236, runtime 5,704 s (without counters: 4,688 s)
(running time without loop, counters and psyco: 32,792 s, library 608 ms)
So without the extra counters the benefit of this loop using psyco is in the harder case: (4688-4594)*100/4688.0 % = 2 %
This inspired me to reverse another earlier optimization, which I had wondered about in DaniWeb. Earlier version of code run faster, when the smallest word size was global, not paramerter. According to documentation, local variable calls are faster, but apparantly the cost in making recursion heavier outweighted that. Now in the harder case this other reversal of optimization brought more expected performance behaviour in the case of no optimization of word lenght: the run time with psyco was 312 ms preparations, 4,469..4,484 s total running time. So this made code cleaner and brought more benefit in this case as the removed loop had. And putting the parameter to version with while loop, did not change the running time much (the variation became greater for library preparation code)
**What I learned from this: If you do n optimizations for speed
you must check the first n-1 optimizations after doing nth one**