views:

3339

answers:

5

The other day I was doing some Python benchmarking and I came across something interesting. Below are two loops that do more or less the same thing. Loop 1 takes about twice as long as loop 2 to execute.

Loop 1:

int i = 0
while i < 100000000:
  i += 1

Loop 2:

for n in range(0,100000000):
  pass

Why is the first loop so much slower? I know it's a trivial example but it's piqued my interest. Is there something special about the range() function that makes it more efficient than incrementing a variable the same way?

+2  A: 

Because you are running more often in code written in C in the interpretor. i.e. i+=1 is in Python, so slow (comparatively), whereas range(0,...) is one C call the for loop will execute mostly in C too.

John Montgomery
+13  A: 

range() is implemented in C, whereas i += 1 is interpreted.

Using xrange() could make it even faster for large numbers. Starting with Python 3.0 range() is the same as previously xrange().

Georg
+30  A: 

see the disassembly of python byte code, you may get a more concrete idea

use while loop:

1           0 LOAD_CONST               0 (0)
            3 STORE_NAME               0 (i)

2           6 SETUP_LOOP              28 (to 37)
      >>    9 LOAD_NAME                0 (i)              # <-
           12 LOAD_CONST               1 (100000000)      # <-
           15 COMPARE_OP               0 (<)              # <-
           18 JUMP_IF_FALSE           14 (to 35)          # <-
           21 POP_TOP                                     # <-

3          22 LOAD_NAME                0 (i)              # <-
           25 LOAD_CONST               2 (1)              # <-
           28 INPLACE_ADD                                 # <-
           29 STORE_NAME               0 (i)              # <-
           32 JUMP_ABSOLUTE            9                  # <-
      >>   35 POP_TOP
           36 POP_BLOCK

The loop body has 10 op

use range:

1           0 SETUP_LOOP              23 (to 26)
            3 LOAD_NAME                0 (range)
            6 LOAD_CONST               0 (0)
            9 LOAD_CONST               1 (100000000)
           12 CALL_FUNCTION            2
           15 GET_ITER
      >>   16 FOR_ITER                 6 (to 25)        # <-
           19 STORE_NAME               1 (n)            # <-

2          22 JUMP_ABSOLUTE           16                # <-
      >>   25 POP_BLOCK
      >>   26 LOAD_CONST               2 (None)
           29 RETURN_VALUE

The loop body has 3 op

The time to run C code is much shorter than intepretor and can be ignored.

kcwu
+1 for explaining an answer with a disassembly
TwentyMiles
Actually the loop body in the first disassembly has 10 operations (the jump from position 32 to 9). In the current CPython implementation the interpretation of each bytecode results with a quite high probablitity in a costly indirect branch mispredict in the CPU (the jump to the implementation of the next bytecode). This is a consequence of the current implementation of CPython, the JITs being implemented by unladen swallow, PyPy and others will most likely lose that overhead. The best of them will also be able to do type specialization for an order of magnitude speedup.
Ants Aasma
A: 

Most of Python's built in method calls are run as C code. Code that has to be interpreted is much slower. In terms of memory efficiency and execution speed the difference is gigantic. The python internals have been optimized to the extreme, and it's best to take advantage of those optimizations.

ohdeargod
A: 

I am running python 2.5.1 (macbook black with leopard) and getting the exact opposite of this. Found this question when researching for a reason why a while loop is faster than a for.

medecau