views:

609

answers:

8

I'm in an introductory comp-sci class (after doing web programming for years) and became curious about how much speed I was gaining, if any, with my one-liners.

for line in lines:
  numbers.append(eval(line.strip().split()[0]))

So I wrote the same thing with painfully explicit assignments and ran them against each other.

for line in lines:
  a = line.split()
  b = a[0]
  c = b.strip()
  d = eval(c)
  numbers.append(d)

The second one runs a consistent 30ms faster (on my FreeBSD shell account; see Edit #2) with an input file of 100K lines! Of course this is on a total run time of 3 seconds, so the percentage isn't large...but I'm really surprised to see all those explicit named assignments somehow helping.

There's a recent thread on the performance of functions as opposed to inline code, but this seems even more basic. What gives? Should I be writing lovingly fulsomely verbose code and telling my sneering colleagues it's for performance reasons? (Thankfully, a list-comprehension version runs about 10ms faster yet, so my cherished compactness isn't totally out the window.)

EDIT: Thanks for the tip on my sloppy expansion of the code. You're all correct that the second one should really be:

for line in lines:
  a = line.strip()
  b = a.split()
  c = b[0]
  d = eval(c)
  numbers.append(d)

However, even once I've fixed that, my timings are 2.714s, 2.652s, and 2.624s respectively, for the one-liner, the fully-exploded form, and the list comprehension (not pictured). So my question stands!

EDIT #2: It's interesting that the answer doesn't seem to be obvious even to a bunch of knowledgeable folks, which makes me feel a bit better about the question! I'll probably play with dis a bit on my own now, in this and similar circumstances, and see what turns up. By all means keep tinkering with the thread if you want, but I'm going to declare my received answer to be "huh, that's interesting; must be something deep." Especially since the behavior isn't consistent across machines, as steveha pointed out -- the slight difference goes the other direction on my Debian and Windows installs. Thanks to everyone who's contributed!

+7  A: 

Frankly speaking, the first version, where everything is in one line, is a pain to read.
The second one is maybe a little too verbose (something in the middle would be appreciated) but it is definitely better.

I would not care too much about micro optimizations because of Python internals, and focus only on readable code.

By the way: the two (initial) versions are not doing the same thing.
In the former, you first strip, then split, while in the latter you first split and then strip (furthermore, only the first element).
Again, I think you overlooked this because the former version is quite difficult to focus on.
Then, analyzing the two (updated) versions with dis (python disassembler) showed no real difference between the two codes, only the order how the function names are being looked up. It is possible that this may have an impact on performance.

While we are on this, you could get some performance improvement just by binding eval to a local variable, before the loop. I would expect that after that change, there should be no difference in time between the two versions.
For example:

eval_ = eval
for line in lines:
    a = line.strip()
    b = a.split()
    c = b[0]
    d = eval_(c)
    numbers.append(d)

We are mostly talking about micro-optimizations, but this aliasing is actually a technique that may be very useful in several circumstances.

Roberto Liffredo
Thanks; and yes, I'm all for readable code. My question in this case, though, is solely about why one is faster than the other -- I'm not actually trying to decide between them in a practical sense. Like I said, I'm actually *using* a third option entirely.
Jenn D.
+1  A: 

A one liner doesn't mean smaller or faster code. And I would expect that the eval() line would throw off performance measurements quite a bit.

Do you see similar performance differences without eval ?

Seth
+15  A: 

Your code isn't exploded in the same order. The compact version goes:

A > B > C > D > E

while your exploded version goes

B > C > A > D > E

The effect is that strip() is being deferred 2 steps down, which may affect performance depending on what the input is.

johnvey
I think the perf increase must be eval() not having to process so much whitespace. Otherwise it seems like the single strip() would be faster than many strip()s.
Frank Schwieterman
+4  A: 

The method calls are also not in the same order:

for line in lines:
    numbers.append(eval(line.strip().split()[0]))

should be:

for line in lines:
    numbers.append(eval(line.split()[0].strip()))
jmans
Argh. Right you are -- and johnvey and Mihail too! I'm glad I labeled it "beginner". :) Off to fix that and do some more timings.
Jenn D.
+3  A: 

I agree with Roberto Liffredo; don't worry about that small of a performance improvement; code that is easier to understand, debug, and change is its own reward.

As for what's going on: the terse code and the expanded code don't do quite the same things. line.strip().split() first strips the line and then splits it; your expanded code splits the line first, and then calls strip() on the first word from the line. Now, the strip() isn't needed here; it's stripping white space from the end of the line, and words returned by split() never have any. Thus, in your expanded version, strip() has absolutely no work to do.

Without benchmarking, I can't be certain, but I think that strip() having no work to do is the key. In the one-line version, strip() sometimes has work to do; so it will strip the whitespace, building a new string object, and then return that string object. Then, that new string object will be split and discarded. The extra work of creating and discarding string objects is likely what is making the one-line solution slower. Compare that with the expanded version, where strip() simply looks at the string, decides it has no work to do, and returns the string unmodified.

In summary, I predict that a one-liner equivalent to your expanded code will be slightly faster than your expanded code. Try benchmarking this:

for line in lines:
  numbers.append(eval(line.split()[0].strip()))

If you want to be completely thorough, you could benchmark both versions with the strip() removed completely. You just don't need it. Or, you could pre-process your input file, making sure that there is no leading or trailing white space on any input line, and thus never any work for strip() to do, and you will probably see the benchmarks work as you would expect.

If you really want to make a hobby out of optimizing for speed here, you could call split with a "maxsplit" argument; you don't need to process the whole string as you are throwing away everything after the first split. Thus you could call split(None, 1). You can get rid of the strip(), of course. And you would then have:

for line in lines:
  numbers.append(eval(line.split(None, 1)[0]))

If you knew the numbers were always integers, you could call int() instead of eval(), for a speed improvement and security improvement.

steveha
The one-liner is the original, so it does need the strip to get rid of line endings when there's only one item on the line. The various suggestions for further optimization and security are correct, but it's a class project; the professor controls the input, which includes floats. The question I'm asking isn't directly related to the assignment; I just want to know why the exploded version runs faster (once it's actually made equivalent to the one-liner, which was of course my mistake). Thanks for the tip on maxsplit; that's one to tuck away for later.
Jenn D.
No, really, `split()` will strip white space even if there is only one item on the line. Try this:`print " foo ".split()[0] # prints "foo"`
steveha
Argh, the wiki ate my white space. Try putting multiple spaces before and after the "foo" in the string, and you will see that `split` gets rid of them.
steveha
Could've sworn it wasn't behaving like that when I tried it at first; but right you are!
Jenn D.
+1 for replacing `line.strip().split()[0]` with the more explicit (and probably faster) `line.split(None, 1)[0]`.
EOL
+3  A: 

Also, sometimes it's tricky to run benchmarks. Did you re-run the benchmarks multiple times and take the best of several runs? Is there any chance that caching effects give a performance advantage to the second Python program you run? Have you tried making your input file ten times bigger, so your program will take about ten times longer to run?

steveha
Yep, did all that, but thanks for the tips. Differences are still there at a million lines, and consistent across multiple runs.
Jenn D.
+2  A: 

I haven't benchmarked it, but one factor in the time differences is that you have to do several variable lookups in the second function.

From Python Patterns - An Optimization Anecdote:

This is because local variable lookups are much faster than global or built-in variable lookups: the Python "compiler" optimizes most function bodies so that for local variables, no dictionary lookup is necessary, but a simple array indexing operation is sufficient.

So, local variable lookups do have a cost associated. Let's take a look at the disassembled functions:

First, making sure I have the same defined functions as you:

>>> def a(lines):
    for line in lines:
     numbers.append(eval(line.strip().split()[0]))

>>> def b(lines):
    for line in lines:
     a = line.strip()
     b = a.split()
     c = b[0]
     d = eval(c)
     numbers.append(d)

Now, let's compare their disassembled values:

>>> import dis
>>> dis.dis(a)
  2           0 SETUP_LOOP              49 (to 52)
              3 LOAD_FAST                0 (lines)
              6 GET_ITER            
        >>    7 FOR_ITER                41 (to 51)
             10 STORE_FAST               1 (line)

  3          13 LOAD_GLOBAL              0 (numbers)
             16 LOAD_ATTR                1 (append)
             19 LOAD_GLOBAL              2 (eval)
             22 LOAD_FAST                1 (line)
             25 LOAD_ATTR                3 (strip)
             28 CALL_FUNCTION            0
             31 LOAD_ATTR                4 (split)
             34 CALL_FUNCTION            0
             37 LOAD_CONST               1 (0)
             40 BINARY_SUBSCR       
             41 CALL_FUNCTION            1
             44 CALL_FUNCTION            1
             47 POP_TOP             
             48 JUMP_ABSOLUTE            7
        >>   51 POP_BLOCK           
        >>   52 LOAD_CONST               0 (None)
             55 RETURN_VALUE        
>>> dis.dis(b)
  2           0 SETUP_LOOP              73 (to 76)
              3 LOAD_FAST                0 (lines)
              6 GET_ITER            
        >>    7 FOR_ITER                65 (to 75)
             10 STORE_FAST               1 (line)

  3          13 LOAD_FAST                1 (line)
             16 LOAD_ATTR                0 (strip)
             19 CALL_FUNCTION            0
             22 STORE_FAST               2 (a)

  4          25 LOAD_FAST                2 (a)
             28 LOAD_ATTR                1 (split)
             31 CALL_FUNCTION            0
             34 STORE_FAST               3 (b)

  5          37 LOAD_FAST                3 (b)
             40 LOAD_CONST               1 (0)
             43 BINARY_SUBSCR       
             44 STORE_FAST               4 (c)

  6          47 LOAD_GLOBAL              2 (eval)
             50 LOAD_FAST                4 (c)
             53 CALL_FUNCTION            1
             56 STORE_FAST               5 (d)

  7          59 LOAD_GLOBAL              3 (numbers)
             62 LOAD_ATTR                4 (append)
             65 LOAD_FAST                5 (d)
             68 CALL_FUNCTION            1
             71 POP_TOP             
             72 JUMP_ABSOLUTE            7
        >>   75 POP_BLOCK           
        >>   76 LOAD_CONST               0 (None)
             79 RETURN_VALUE

It's a lot of information, but we can see the second method is riddled with STORE_FAST, LOAD_FAST pairs due to the local variables being used. That might be enough to cause your small timing differences, (perhaps) in addition to the different operation order as others have mentioned.

Mark Rushakoff
Ah ha! Yes, this is addressing my question. I can see that I'll have to look into this dis thing; like I said, I'm familiar with python but new to comp sci and "real" programming.So...you think that maybe somehow the variables are more local, and therefore faster, in the expanded version? You're already using the corrected operation order, so that's out of the picture now. Great stuff; thanks!
Jenn D.
Hi Mark, using dis() is very helpful, however doesn't your answer try to show why the second method would be slower? Whereas he's saying it's slightly faster.
benhoyt
@Jenn, @ben: While the bytecode does appear to suggest the second approach would be slower, I really don't know how much (if at all) the bytecode is optimized when the VM runs it. It may be that the `LOAD_ATTR`/`LOAD_FAST` pairs actually get translated to something more efficient than the individual `LOAD_FAST`s. Hopefully someone a little more familiar can pipe in, but I do think I'm going to take a closer look at the VM source out of my own curiosity.
Mark Rushakoff
That "An Optimization Anecdote" is very very old ( at least older than 2002 ). Try timing the code in that text and you'll see that pretty much nothing is right anymore. Local variable lookups *used* to be a performance problem 7 years ago, but that issue and most of the others in that text are long gone.
THC4k
benhoyt: I wondered about that too, and thought perhaps I was misinterpreting Mark's interpretation. :) But I've checkmarked this answer because at least it's a way to *demonstrate* some difference between the two or three options. As he says, maybe an optimization guru will wander by.
Jenn D.
Note that later tests have shown this behavior not to be universal on all python installs (thanks, steveha). My ISP's FreeBSD userhost with Python 2.5.4 runs the exploded version faster, but others seem to show the difference being in the other direction. So Mark was right to begin with -- his machine probably *is* running the one-liner faster! Very odd.
Jenn D.
+2  A: 

Okay, enough theorizing. I created a file with one million lines, with random amounts of white space (0 to 4 spaces, usually 0) at beginning and end of each line. And I ran your one-liner, your expanded version, and my own list comprehension version (as fast as I know how to make it).

My results? (Each one is the best of three trials):

one-line: 13.208s
expanded: 26.321s
listcomp: 13.024s

I tested under Ubuntu 9.04, 32-bit, with Python 2.6.2 (CPython, of course).

So I am completely unable to explain why you saw the expanded one running faster, given that it ran half as fast on my computer.

Here's the Python program I used to generate my test data:

import random

f = open("/tmp/junk.txt", "w")

r = random.Random()

def randws():
    n = r.randint(0, 10) - 4
    if n < 0 or n > 4:
        n = 0
    return " " * n

for i in xrange(1000000):
    s0 = randws()
    n = r.randint(0, 256)
    s1 = randws()
    f.write("%s%d%s\n" % (s0, n, s1))

Here's my list comprehension program:

lines = open("/tmp/junk.txt")

numbers = [eval(line.split(None, 1)[0]) for line in lines]

P.S. Here is a nice, fast version that can handle both int and float values.

lines = open("/tmp/junk.txt")

def val(x):
    try:
        return int(x)
    except ValueError:
        pass

    try:
        return float(x)
    except StandardError:
        return 0

numbers = [val(line.split(None, 1)[0]) for line in lines]

Its best-of-three time was: 2.161s

steveha
By the way, I just tried modifying my list comprehension version, replacing `eval()` with `int()` (since my sample data has nothing but integers). The result:1.942sSix times as fast as the `eval()` version!I wrote another version, which instead of calling `int()`, calls a function that can handle both `int` and `float` types. It's only slightly slower:`lines = open("/tmp/junk.txt")def val(x): try: if x.isdigit(): return int(x) return float(x) except StandardError: return 0numbers = [val(line.split(None, 1)[0]) for line in lines]`
steveha
Argh, I'm still learning my way around here. That code is pretty hard to read, so I edited the answer above it; ignore the previous comment and look at the P.S. in the answer.
steveha
Wow; I appreciate the efforts! The thing is, though, that *I'm not trying to optimize this code*. It's a class project that's inherently simplistic. I'm just trying to understand why the two or three bits I've got, as written, differ in the way they do. Hm; I happened to be doing all this under Python 2.5.4 on FreeBSD. I'll try some other versions and platforms and see what I get.
Jenn D.
Argh! Indeed, 2.4.4 on Debian is showing the exploded version as the slowest...but in my case, only by about 2%, not 200%. Same for 2.5 on Windows. I guess that answers that, then: the answer is that the second version *doesn't* necessarily run faster, and there's something deep and circumstantial going on.
Jenn D.