ansaurus

Question

Rationale behind Python's preferred for syntax

Answer 1

+10 A:

You forgot to reset y to 0 after the inner loop.

#!/usr/bin/env python
M = 10000
N = 10000

if __name__ == "__main__":
    x, y = 0, 0
    while x < N:
        while y < M:
            y += 1
        x += 1
        y = 0

ed: 20.63s after fix vs. 6.97s using xrange

Glenn Maynard 2010-04-10 01:21:06

I was about to answer similarly. My corrected while-loop code ran about 3 times slower than the for-loop code.

Darius Bacon 2010-04-10 01:22:58

In all seriousness, no. Language idioms always have to take reasonable performance into account; if the xrange idiom really *was* 40 times slower, it would be a flawed idiom that should either be fixed or no longer used. Readability is important, very often even at the expense of some performance--but not that much.

Glenn Maynard 2010-04-10 01:27:20

OK deleted my answer anyway...

Felix Kling 2010-04-10 01:30:44

Besides the performance, this error in the question is an excellent demonstration why high level loops are preferred!

Piotr Lesnicki 2010-04-10 02:19:44

Answer 2

+3 A:

good for iterating over data structures

The for i in ... syntax is great for iterating over data structures. In a lower-level language, you would generally be iterating over an array indexed by an int, but with the python syntax you can eliminate the indexing step.

Mark Harrison 2010-04-10 01:31:58

Answer 3

+19 A:

Here's the proper comparison, e.g. in loop.py:

M = 10000
N = 10000

def thegoodone():
   for x in xrange(N):
       for y in xrange(M):
           pass

def thebadone():
    x = 0
    while x < N:
        y = 0
        while y < M:
            y += 1
        x += 1

All substantial code should always be in functions -- putting a hundred million loops at a module's top level shows reckless disregard for performance and makes a mockery of any attempts at measuring said performance.

Once you've done that, you see:

$ python -mtimeit -s'import loop' 'loop.thegoodone()'
10 loops, best of 3: 3.45 sec per loop
$ python -mtimeit -s'import loop' 'loop.thebadone()'
10 loops, best of 3: 10.6 sec per loop

So, properly measured, the bad way that you advocate is about 3 times slower than the good way which Python promotes. I hope this makes you reconsider your erroneous advocacy.

Alex Martelli 2010-04-10 01:47:05

"All substantial code should always be in functions - putting a hundred million loops at a module's top level shows reckless disregard for performance" (I see.) Could you explain why?

Federico Ramponi 2010-04-10 01:52:10

@Federico, **speed**. Variable get and set is highly optimized in functions, but can't possibly be at module top-level. E.g., assuming that my laptop and Glenn's machine are equivalent, from our numbers you see a factor of 2 difference for doing things the proper way (all substantial code in functions) vs the totally wrong way (substantial code at module top level). Benchmark it yourself!-)

Alex Martelli 2010-04-10 01:57:18

I just did. At module level: 14.3s vs. 36.0s. Local to function: 8.6s vs. 18.5s. (!!) I didn't know, thank you.

Federico Ramponi 2010-04-10 02:06:08

+1 I can't blame susmits, after all who whould know that code inside functions would run that different compared as running outside of a function. Reading your answer makes completely sense, because after all, code "floating around" the file is module code and this is not evident when someone is new to the language ( as my self ). Sometimes is easy to think about python as a supercharged bash :-S which, obviously as shown here, it is not. :)

OscarRyz 2010-04-10 02:13:29

I've repeated the test. The idiomatic for loop is 5 times faster than the while loop http://stackoverflow.com/questions/2611867/rationale-behind-pythons-preferred-for-syntax/2612089#2612089

J.F. Sebastian 2010-04-10 03:12:50

+1 @Alex, Could you expand a little more on why get and set can't be optimized at module top level, and why they can be inside functions?

Brendan Abel 2010-04-10 03:50:38

@007brendan, module-level variables are visible from outside the module (i.e., other modules), so every single little change applied to them _must_ be immediately applied, every fetch _must_ be done afresh from the module's dict, unless you can **prove** otherwise with 100% certainty... which, given threading, is just impossible. Function-local variables are **not** visible (much less alterable) outside that function instance, which empowers all kinds of optimizations in any semi-decent Python implementation.

Alex Martelli 2010-04-10 03:55:19

Ah, thank you. I had no idea that declaring variables in module scope could have such an impact on performance, 3x seems way more reasonable in comparison.

susmits 2010-04-10 08:14:56

Also, in case it wasn't 100% clear, module-level variables must be looked up and set via a hash table (the_module.__dict__) while function-local variables are looked up and set in an array local to that function call.

jemfinch 2010-04-11 06:15:38

Answer 4

A:

I've repeated the test from @Alex Martelli's answer. The idiomatic for loop is 5 times faster than the while loop:

python -mtimeit -s'from while_vs_for import while_loop as loop' 'loop(10000)'
10 loops, best of 3: 9.6 sec per loop
python -mtimeit -s'from while_vs_for import for_loop as loop'   'loop(10000)'
10 loops, best of 3: 1.83 sec per loop

while_vs_for.py:

def while_loop(N):
    x = 0
    while x < N:
        y = 0
        while y < N:
            pass
            y += 1
        x += 1

def for_loop(N):
    for x in xrange(N):
        for y in xrange(N):
            pass

At module level:

$ time -p python for.py
real 4.38
user 4.37
sys 0.01
$ time -p python while.py
real 14.28
user 14.28
sys 0.01

J.F. Sebastian 2010-04-10 03:12:01

Answer 5

A:

this is not a direct answer to the question, but i want to open the dialog a bit more on xrange(). two things:

A. there is something wrong with one of the OP statements that no one has corrected yet (yes, in addition to the bug in the code of not resetting y):

"I imagine it has a lot to do with calling xrange N times...."

unlike traditional counting for loops, Python's is more like a shell's foreach... looping over an iterable. therefore, xrange() is called exactly once, not "N times."

B. xrange() is the name of this function in Python 2. it replaces and is renamed to range() in Python 3, so keep this in mind when porting. if you didn't know already, xrange() returns an iterator(-like object) while range() returns lists. since the latter is more inefficient, it has been deprecated in favor of xrange() which is more memory-friendly. the workaround in Python 3, for all those who need to have a list is list(range(N)).

wescpy 2010-04-10 23:57:26

I thought that each time the inner loop finished all its iterations, an xrange(M) object was defined again, because AFAIK, generators could not be rewound.

susmits 2010-04-11 03:24:52

ansaurus

tags:

views:

answers:

Rationale behind Python's preferred for syntax

related questions