ansaurus

Question

Why is my "exploded" Python code actually running faster?

Answer 1

+7 A:

Frankly speaking, the first version, where everything is in one line, is a pain to read.
The second one is maybe a little too verbose (something in the middle would be appreciated) but it is definitely better.

I would not care too much about micro optimizations because of Python internals, and focus only on readable code.

By the way: the two (initial) versions are not doing the same thing.
In the former, you first strip, then split, while in the latter you first split and then strip (furthermore, only the first element).
Again, I think you overlooked this because the former version is quite difficult to focus on.
Then, analyzing the two (updated) versions with dis (python disassembler) showed no real difference between the two codes, only the order how the function names are being looked up. It is possible that this may have an impact on performance.

While we are on this, you could get some performance improvement just by binding eval to a local variable, before the loop. I would expect that after that change, there should be no difference in time between the two versions.
For example:

eval_ = eval
for line in lines:
    a = line.strip()
    b = a.split()
    c = b[0]
    d = eval_(c)
    numbers.append(d)

We are mostly talking about micro-optimizations, but this aliasing is actually a technique that may be very useful in several circumstances.

Roberto Liffredo 2009-09-02 19:46:53

Thanks; and yes, I'm all for readable code. My question in this case, though, is solely about why one is faster than the other -- I'm not actually trying to decide between them in a practical sense. Like I said, I'm actually *using* a third option entirely.

Jenn D. 2009-09-02 19:51:31

Answer 2

+1 A:

A one liner doesn't mean smaller or faster code. And I would expect that the eval() line would throw off performance measurements quite a bit.

Do you see similar performance differences without eval ?

Seth 2009-09-02 19:49:05

Answer 3

+15 A:

Your code isn't exploded in the same order. The compact version goes:

A > B > C > D > E

while your exploded version goes

B > C > A > D > E

The effect is that strip() is being deferred 2 steps down, which may affect performance depending on what the input is.

johnvey 2009-09-02 19:52:16

I think the perf increase must be eval() not having to process so much whitespace. Otherwise it seems like the single strip() would be faster than many strip()s.

Frank Schwieterman 2009-09-03 16:57:56

Answer 4

+4 A:

The method calls are also not in the same order:

for line in lines:
    numbers.append(eval(line.strip().split()[0]))

should be:

for line in lines:
    numbers.append(eval(line.split()[0].strip()))

jmans 2009-09-02 19:53:10

Argh. Right you are -- and johnvey and Mihail too! I'm glad I labeled it "beginner". :) Off to fix that and do some more timings.

Jenn D. 2009-09-02 19:59:29

Answer 5

+3 A:

I agree with Roberto Liffredo; don't worry about that small of a performance improvement; code that is easier to understand, debug, and change is its own reward.

As for what's going on: the terse code and the expanded code don't do quite the same things. line.strip().split() first strips the line and then splits it; your expanded code splits the line first, and then calls strip() on the first word from the line. Now, the strip() isn't needed here; it's stripping white space from the end of the line, and words returned by split() never have any. Thus, in your expanded version, strip() has absolutely no work to do.

Without benchmarking, I can't be certain, but I think that strip() having no work to do is the key. In the one-line version, strip() sometimes has work to do; so it will strip the whitespace, building a new string object, and then return that string object. Then, that new string object will be split and discarded. The extra work of creating and discarding string objects is likely what is making the one-line solution slower. Compare that with the expanded version, where strip() simply looks at the string, decides it has no work to do, and returns the string unmodified.

In summary, I predict that a one-liner equivalent to your expanded code will be slightly faster than your expanded code. Try benchmarking this:

for line in lines:
  numbers.append(eval(line.split()[0].strip()))

If you want to be completely thorough, you could benchmark both versions with the strip() removed completely. You just don't need it. Or, you could pre-process your input file, making sure that there is no leading or trailing white space on any input line, and thus never any work for strip() to do, and you will probably see the benchmarks work as you would expect.

If you really want to make a hobby out of optimizing for speed here, you could call split with a "maxsplit" argument; you don't need to process the whole string as you are throwing away everything after the first split. Thus you could call split(None, 1). You can get rid of the strip(), of course. And you would then have:

for line in lines:
  numbers.append(eval(line.split(None, 1)[0]))

If you knew the numbers were always integers, you could call int() instead of eval(), for a speed improvement and security improvement.

steveha 2009-09-02 20:15:01

The one-liner is the original, so it does need the strip to get rid of line endings when there's only one item on the line. The various suggestions for further optimization and security are correct, but it's a class project; the professor controls the input, which includes floats. The question I'm asking isn't directly related to the assignment; I just want to know why the exploded version runs faster (once it's actually made equivalent to the one-liner, which was of course my mistake). Thanks for the tip on maxsplit; that's one to tuck away for later.

Jenn D. 2009-09-02 20:23:48

No, really, `split()` will strip white space even if there is only one item on the line. Try this:`print " foo ".split()[0] # prints "foo"`

steveha 2009-09-02 20:27:34

Argh, the wiki ate my white space. Try putting multiple spaces before and after the "foo" in the string, and you will see that `split` gets rid of them.

steveha 2009-09-02 20:30:35

Could've sworn it wasn't behaving like that when I tried it at first; but right you are!

Jenn D. 2009-09-02 21:01:59

+1 for replacing `line.strip().split()[0]` with the more explicit (and probably faster) `line.split(None, 1)[0]`.

EOL 2009-09-03 07:37:38

Answer 6

+3 A:

Also, sometimes it's tricky to run benchmarks. Did you re-run the benchmarks multiple times and take the best of several runs? Is there any chance that caching effects give a performance advantage to the second Python program you run? Have you tried making your input file ten times bigger, so your program will take about ten times longer to run?

steveha 2009-09-02 20:24:23

Yep, did all that, but thanks for the tips. Differences are still there at a million lines, and consistent across multiple runs.

Jenn D. 2009-09-02 20:51:08

Answer 7

+2 A:

I haven't benchmarked it, but one factor in the time differences is that you have to do several variable lookups in the second function.

From Python Patterns - An Optimization Anecdote:

This is because local variable lookups are much faster than global or built-in variable lookups: the Python "compiler" optimizes most function bodies so that for local variables, no dictionary lookup is necessary, but a simple array indexing operation is sufficient.

So, local variable lookups do have a cost associated. Let's take a look at the disassembled functions:

First, making sure I have the same defined functions as you:

>>> def a(lines):
    for line in lines:
     numbers.append(eval(line.strip().split()[0]))

>>> def b(lines):
    for line in lines:
     a = line.strip()
     b = a.split()
     c = b[0]
     d = eval(c)
     numbers.append(d)

Now, let's compare their disassembled values:

>>> import dis
>>> dis.dis(a)
  2           0 SETUP_LOOP              49 (to 52)
              3 LOAD_FAST                0 (lines)
              6 GET_ITER            
        >>    7 FOR_ITER                41 (to 51)
             10 STORE_FAST               1 (line)

  3          13 LOAD_GLOBAL              0 (numbers)
             16 LOAD_ATTR                1 (append)
             19 LOAD_GLOBAL              2 (eval)
             22 LOAD_FAST                1 (line)
             25 LOAD_ATTR                3 (strip)
             28 CALL_FUNCTION            0
             31 LOAD_ATTR                4 (split)
             34 CALL_FUNCTION            0
             37 LOAD_CONST               1 (0)
             40 BINARY_SUBSCR       
             41 CALL_FUNCTION            1
             44 CALL_FUNCTION            1
             47 POP_TOP             
             48 JUMP_ABSOLUTE            7
        >>   51 POP_BLOCK           
        >>   52 LOAD_CONST               0 (None)
             55 RETURN_VALUE        
>>> dis.dis(b)
  2           0 SETUP_LOOP              73 (to 76)
              3 LOAD_FAST                0 (lines)
              6 GET_ITER            
        >>    7 FOR_ITER                65 (to 75)
             10 STORE_FAST               1 (line)

  3          13 LOAD_FAST                1 (line)
             16 LOAD_ATTR                0 (strip)
             19 CALL_FUNCTION            0
             22 STORE_FAST               2 (a)

  4          25 LOAD_FAST                2 (a)
             28 LOAD_ATTR                1 (split)
             31 CALL_FUNCTION            0
             34 STORE_FAST               3 (b)

  5          37 LOAD_FAST                3 (b)
             40 LOAD_CONST               1 (0)
             43 BINARY_SUBSCR       
             44 STORE_FAST               4 (c)

  6          47 LOAD_GLOBAL              2 (eval)
             50 LOAD_FAST                4 (c)
             53 CALL_FUNCTION            1
             56 STORE_FAST               5 (d)

  7          59 LOAD_GLOBAL              3 (numbers)
             62 LOAD_ATTR                4 (append)
             65 LOAD_FAST                5 (d)
             68 CALL_FUNCTION            1
             71 POP_TOP             
             72 JUMP_ABSOLUTE            7
        >>   75 POP_BLOCK           
        >>   76 LOAD_CONST               0 (None)
             79 RETURN_VALUE

It's a lot of information, but we can see the second method is riddled with STORE_FAST, LOAD_FAST pairs due to the local variables being used. That might be enough to cause your small timing differences, (perhaps) in addition to the different operation order as others have mentioned.

Mark Rushakoff 2009-09-02 21:10:13

Ah ha! Yes, this is addressing my question. I can see that I'll have to look into this dis thing; like I said, I'm familiar with python but new to comp sci and "real" programming.So...you think that maybe somehow the variables are more local, and therefore faster, in the expanded version? You're already using the corrected operation order, so that's out of the picture now. Great stuff; thanks!

Jenn D. 2009-09-02 22:19:25

Hi Mark, using dis() is very helpful, however doesn't your answer try to show why the second method would be slower? Whereas he's saying it's slightly faster.

benhoyt 2009-09-02 22:43:51

@Jenn, @ben: While the bytecode does appear to suggest the second approach would be slower, I really don't know how much (if at all) the bytecode is optimized when the VM runs it. It may be that the `LOAD_ATTR`/`LOAD_FAST` pairs actually get translated to something more efficient than the individual `LOAD_FAST`s. Hopefully someone a little more familiar can pipe in, but I do think I'm going to take a closer look at the VM source out of my own curiosity.

Mark Rushakoff 2009-09-02 23:26:26

That "An Optimization Anecdote" is very very old ( at least older than 2002 ). Try timing the code in that text and you'll see that pretty much nothing is right anymore. Local variable lookups *used* to be a performance problem 7 years ago, but that issue and most of the others in that text are long gone.

THC4k 2009-09-03 01:50:31

benhoyt: I wondered about that too, and thought perhaps I was misinterpreting Mark's interpretation. :) But I've checkmarked this answer because at least it's a way to *demonstrate* some difference between the two or three options. As he says, maybe an optimization guru will wander by.

Jenn D. 2009-09-03 16:47:33

Note that later tests have shown this behavior not to be universal on all python installs (thanks, steveha). My ISP's FreeBSD userhost with Python 2.5.4 runs the exploded version faster, but others seem to show the difference being in the other direction. So Mark was right to begin with -- his machine probably *is* running the one-liner faster! Very odd.

Jenn D. 2009-09-03 17:31:11

Answer 8

+2 A:

Okay, enough theorizing. I created a file with one million lines, with random amounts of white space (0 to 4 spaces, usually 0) at beginning and end of each line. And I ran your one-liner, your expanded version, and my own list comprehension version (as fast as I know how to make it).

My results? (Each one is the best of three trials):

one-line: 13.208s
expanded: 26.321s
listcomp: 13.024s

I tested under Ubuntu 9.04, 32-bit, with Python 2.6.2 (CPython, of course).

So I am completely unable to explain why you saw the expanded one running faster, given that it ran half as fast on my computer.

Here's the Python program I used to generate my test data:

import random

f = open("/tmp/junk.txt", "w")

r = random.Random()

def randws():
    n = r.randint(0, 10) - 4
    if n < 0 or n > 4:
        n = 0
    return " " * n

for i in xrange(1000000):
    s0 = randws()
    n = r.randint(0, 256)
    s1 = randws()
    f.write("%s%d%s\n" % (s0, n, s1))

Here's my list comprehension program:

lines = open("/tmp/junk.txt")

numbers = [eval(line.split(None, 1)[0]) for line in lines]

P.S. Here is a nice, fast version that can handle both int and float values.

lines = open("/tmp/junk.txt")

def val(x):
    try:
        return int(x)
    except ValueError:
        pass

    try:
        return float(x)
    except StandardError:
        return 0

numbers = [val(line.split(None, 1)[0]) for line in lines]

Its best-of-three time was: 2.161s

steveha 2009-09-03 00:14:07

By the way, I just tried modifying my list comprehension version, replacing `eval()` with `int()` (since my sample data has nothing but integers). The result:1.942sSix times as fast as the `eval()` version!I wrote another version, which instead of calling `int()`, calls a function that can handle both `int` and `float` types. It's only slightly slower:`lines = open("/tmp/junk.txt")def val(x): try: if x.isdigit(): return int(x) return float(x) except StandardError: return 0numbers = [val(line.split(None, 1)[0]) for line in lines]`

steveha 2009-09-03 00:24:05

Argh, I'm still learning my way around here. That code is pretty hard to read, so I edited the answer above it; ignore the previous comment and look at the P.S. in the answer.

steveha 2009-09-03 00:30:05

Wow; I appreciate the efforts! The thing is, though, that *I'm not trying to optimize this code*. It's a class project that's inherently simplistic. I'm just trying to understand why the two or three bits I've got, as written, differ in the way they do. Hm; I happened to be doing all this under Python 2.5.4 on FreeBSD. I'll try some other versions and platforms and see what I get.

Jenn D. 2009-09-03 16:44:35

Argh! Indeed, 2.4.4 on Debian is showing the exploded version as the slowest...but in my case, only by about 2%, not 200%. Same for 2.5 on Windows. I guess that answers that, then: the answer is that the second version *doesn't* necessarily run faster, and there's something deep and circumstantial going on.

Jenn D. 2009-09-03 17:20:42

ansaurus

tags:

views:

answers:

Why is my "exploded" Python code actually running faster?

related questions