ansaurus

Question

Python string 'join' is faster(?) than '+', but what's wrong here?

Answer 1

+4 A:

I assume x() is slower because you're first building the array and then joining it. So you're not only measuring the time that join takes, but also the time that you take to build the array.

In a scenario where you already have an array and you want to create a string out of its elements, join should be faster than iterating through the array and building the string step by step.

sepp2k 2009-08-28 21:05:30

+1: I agree; I get similarly proportioned timings in python 2.6. And, the join command is really designed for existing arrays; you should add a function that does a list concatenation: return ''.join([ "abcdefg"[i%7] for i in range(100) ]) and see what happens.

Walt W 2009-08-28 21:08:47

I of course meant list comprehension. But yeah, I get timings on the order of y() and z() with a function using join with a list comprehension.

Walt W 2009-08-28 21:10:12

That means 'join' is good for ready-made string list only? In that case I think '+' is better, because strings are not found in list form always. Although list comprehension is good, but here, too, to mention that 'range' is used for example purpose only. Practical scenario may differ and force you to use a loop instead.

mshsayem 2009-08-28 21:18:44

That is kind of what sepp2k is implying. I think it largely depends on how you go about generating the "parts" of your string. For most applications, do what's convenient, not what's absolutely optimal. The difference really shouldn't be that much in *most* practical applications.

Walt W 2009-08-28 21:24:42

That's right "For most applications, do what's convenient, not what's absolutely optimal". But in my earlier post most people seemed abhorring the "+".

mshsayem 2009-08-28 21:28:44

Well . . . the biggest problem I can see with + would be if you wanted to change the separating text (say from ", " to ";") then it could be a real pain. Putting together a list generally makes the code look cleaner, as it's apparent that "here is a string being built and here is an atomic list of the components." But it shouldn't matter much.

Walt W 2009-08-28 21:34:26

Answer 2

+1 A:

Interesting: I've done some tests where the size of the string changes, and this is what I found:

def x():
    x = "a" * 100
    s=[]
    for i in range(100):
        # Other codes here...
        s.append(x)
    return ''.join(s)

def z():
    x = "a" * 100
    s=''
    for i in xrange(100):
        # Other codes here...
        s=s+x
    return s

from timeit import timeit
print "x:", timeit(x, number=1000000)
print "z:", timeit(z, number=1000000)

For strings of length 1 (x = "a" * 1):

x: 27.2318270206
z: 14.4046051502

For strings of length 100:

x: 30.0796670914
z: 21.5891489983

And for strings of length 1000, running timeit 100,000 times instead of 1,000,000

x: 14.1769361496
z: 31.4864079952

Which, if my reading of Objects/stringobject.c is correct, makes sense.

It appears, on a first reading, that the String.join algorithm (edge-cases aside) is:

def join(sep, sequence):
    size = 0
    for string in sequence:
        size += len(string) + len(sep)

    result = malloc(size)

    for string in sequence:
        copy string into result
        copy sep into result

    return result

So this will require more or less O(S) steps (where S is the sum of the lengths of all the strings being joined).

David Wolever 2009-08-28 21:40:15

Last 'timeit' couple seems interesting...

mshsayem 2009-08-28 21:47:27

Answer 3

+1 A:

There's a difference between += and + with strings -- if there's no other references to "x", x+=y can just append to x, rather than having to take a copy of the string to append to -- which is the same benefit you get from using "".join().

The main benefit from "".join() over + or += is that join() should always give linear performance, while in many cases +/+= will give quadratic performance (ie, when you double the amount of text, you quadruple the amount of time taken). But this will only matter with a lot of text, not merely 100 bytes, and I think it won't get triggered if you only have one reference to the string you're appending to.

In detail:

Your best case performance for string concatenation is to look at every character in the final string once. "".join() does that naturally -- it has all the information it needs to right from the start.

However a+=b can work in two ways, it can either just add "b" to an existing string, in which case it only needs to look at the characters in "b", or it can have to look at the characters in "a" too.

In C, strcat() always looks at all the characters in both strings, so it's works badly always. In Python, however, the string length is stored, so the string can be extended as long as it's not referenced elsewhere -- and you get good performance by only copying the characters in "b". If it is referenced elsewhere, python will make a copy of "a" first, then add "b" to the end, giving you bad performance. If you're appending five strings in this manner, your time taken will be:

ab = a+b       # Time is a + b
abc = ab+c     # Time is (a+b) + c
abcd = abc+d   # Time is (a+b+c) + d
abcde = abcd+e # Time is (a+b+c+d) + e

which if a,b,c,d,e are all roughly the same size, say, n, is n*(n-1)/2-1 operations, or essentially n-squared.

To get the bad behaviour for x+=y, try:

def a(n=100):
    res = ""
    for k in xrange(n):
        v=res
        res += "foobar"
    return res

Even though v isn't actually used, it's enough to trigger the slower path for += and get the bad behaviour that worries people.

I believe += wasn't introduced until Python 2.0, so it wasn't possible to append efficiently without using something like "".join() in Python 1.6 and earlier.

Anthony Towns 2009-08-28 21:41:20

Answer 4

A:

String concatenation was a lot slower before Python 2.5, when it still created a new copy for every string concatenation rather than appending to the original, leading to join() becoming a popular workaround.

Here's an old benchmark demonstrating the old problem: http://www.skymind.com/~ocrow/python%5Fstring/

David 2009-08-28 21:41:25

I have seen that before and wondering...please note that list comprehension can not be used here...

mshsayem 2009-08-28 21:50:49

Answer 5

+1 A:

In addition to what the others said, 100 1-char strings is really small. (I'm kind of surprised you get separation of results at all.) That's the sort of dataset that fits in your processor cache. You're not going to see asymptotic performance on a microbenchmark.

Ken 2009-08-28 21:42:26

Answer 6

+1 A:

As to why q is a lot slower: when you say

l += "a"

you are appending the string "a" to the end of l, but when you say

l = l + ["a"]

you are creating a new list with the contents of l and ["a"] and then reassigning the results back to l. Thus new lists are constantly being generated.

Kathy Van Stone 2009-08-28 22:00:21

Are these in-place semantics documented anywhere for Python lists? I know that's the way it works in NumPy.

dwf 2009-08-28 22:11:58

l.append("a") is available for constant time list append operations; or l.extend(["a","bb","ccc"]) if you need to add multiple items at once.

Anthony Towns 2009-08-28 22:23:22

Finally, someone who mentioned append.

FogleBird 2009-08-28 22:57:08

@dwf: I wasn't able to immediately find such documentation; I just checked it out in Python.

Kathy Van Stone 2009-08-31 18:21:54

Answer 7

+2 A:

You are measuring two distinct operations: the creation of an array of strings, and the concatenation of strings.

    import timeit
    def x():
        s = []
        for i in range(100):
            s.append("abcdefg"[i%7])
        return ''.join(s)
    def y():
        s = ''
        for i in range(100):
            s += "abcdefgh"[i%7]

    # timeit.timeit(x) returns about 32s
    # timeit.timeit(y) returns about 23s

From the above it would indeed seem that '+' is a faster operation than join. But consider:

    src = []
    def c():
        global src
        s = []
        for i in range(100):
            s.append("abcdefg"[i%7])
        src = s
    def x2():
        return ''.join(src)
    def y2():
        s = ''
        for i in range(len(src)):
            s += src[i]
        return s

    # timeit.timeit(c) returns about 30s
    # timeit.timeit(x2) returns about 1.5s
    # timeit.timeit(y2) returns about 14s

In other words, by timing x() vs y(), your result is polluted by the construction of your source array. If you break that out, you find that join is faster.

Furthermore, you're working with small arrays, and your timing numbers just happen to coincide. If you increase the size of the array and the length of each string significantly, the differences are more clear:

   def c2():
       global src
       s = []
       for i in range(10000):
           s.append("abcdefghijklmnopqrstuvwxyz0123456789"
       src = s

   # timeit.timeit(x2, number=10000) returns about 1s
   # timeit.timeit(y2, number=10000) returns about 80s

Peter S Magnusson 2009-08-28 22:12:17

Answer 8

+2 A:

This question is really about what things cost. We'll play a bit fast and loose here, subtracting results in similar cases. You can decide for yourself if this is a valid method. Here are some basic test cases:

import timeit
def append_to_list_with_join():
    s=[]
    for i in xrange(100):
        s.append("abcdefg"[i%7])
    return ''.join(s)

def append_to_list_with_join_opt():
    s=[]
    x = s.append
    for i in xrange(100):
        x("abcdefg"[i%7])
    return ''.join(s)

def plus_equals_string():
    s=''
    for i in xrange(100):
        s+="abcdefg"[i%7]
    return s

def plus_assign_string():
    s=''
    for i in xrange(100):
        s=s+"abcdefg"[i%7]
    return s

def list_comp_join():
    return ''.join(["abcdefg"[i%7] for i in xrange(100)])

def list_comp():
    return ["abcdefg"[i%7] for i in xrange(100)]

def empty_loop():
    for i in xrange(100):
     pass

def loop_mod():
    for i in xrange(100):
     a = "abcdefg"[i%7]

def fast_list_join():
    return "".join(["0"] * 100)

for f in [append_to_list_with_join, append_to_list_with_join_opt, plus_equals_string,plus_assign_string,list_comp_join, list_comp, empty_loop,loop_mod, fast_list_join]:
    print f.func_name, timeit.timeit(f)

And here is what they cost:

append_to_list_with_join 25.4540209021
append_to_list_with_join_opt 19.9999782794
plus_equals_string 16.7842428996
plus_assign_string 14.8312124167
list_comp_join 16.329590353
list_comp 14.6934344309
empty_loop 2.3819276612
loop_mod 10.1424356308
fast_list_join 2.58149394686

First off, lots of things have unexpected costs in python. append_to_list_with_join versus append_to_list_with_join_opt shows that even looking up a method on an object has a non-negligible cost. In this case, looking up s.append is one-quarter of the time.

Next, list_comp_join versus list_comp shows that join() is pretty fast: It takes about 1.7 or only 10% of list_comp_join's time.

loop_mod shows that the greatest part of this test is actually in setting up the data, irrespective of which string construction method is used. By inference, the time taken for "string = string + ", "string += ", and list comprehension are:

plus_equals_string = 16.78 - 10.14 = 6.64
plus_assign_string = 14.83 - 10.14 = 4.69
list_comp = 14.69 - 10.14 = 4.55

So as to the OP's question, join() is fast but the time to create the underlying list, whether with list primitives or list comprehension, is comparable to creating the string with string primitives. If you already have a list, convert it to a string with join() -- it will be fast.

The timings the OP presents indicate that constructing lists using concatenate operators is slow. In contrast, using list comprehensions is fast. If you have to build a list, use a list comprehension.

Finally, let's take three of the OP's closest functions: what is the difference between x, p, and q? Let's simplify a bit:

import timeit
def x():
    s=[]
    for i in range(100):
        s.append("c")

def p():
    s=[]
    for i in range(100):
        s += "c"

def q():
    s=[]
    for i in range(100):
        s = s + ["c"]

for f in [x,p,q]:
    print f.func_name, timeit.timeit(f)

Here are the results:

x 16.0757342064
p 87.1533697719
q 85.0999698984

And here is the disassembly:

>>> import dis
>>> dis.dis(x)
  2           0 BUILD_LIST               0
              3 STORE_FAST               0 (s)

  3           6 SETUP_LOOP              33 (to 42)
              9 LOAD_GLOBAL              0 (range)
             12 LOAD_CONST               1 (100)
             15 CALL_FUNCTION            1
             18 GET_ITER
        >>   19 FOR_ITER                19 (to 41)
             22 STORE_FAST               1 (i)

  4          25 LOAD_FAST                0 (s)
             28 LOAD_ATTR                1 (append)
             31 LOAD_CONST               2 ('c')
             34 CALL_FUNCTION            1
             37 POP_TOP
             38 JUMP_ABSOLUTE           19
        >>   41 POP_BLOCK
        >>   42 LOAD_CONST               0 (None)
             45 RETURN_VALUE
>>> dis.dis(p)
  2           0 BUILD_LIST               0
              3 STORE_FAST               0 (s)

  3           6 SETUP_LOOP              30 (to 39)
              9 LOAD_GLOBAL              0 (range)
             12 LOAD_CONST               1 (100)
             15 CALL_FUNCTION            1
             18 GET_ITER
        >>   19 FOR_ITER                16 (to 38)
             22 STORE_FAST               1 (i)

  4          25 LOAD_FAST                0 (s)
             28 LOAD_CONST               2 ('c')
             31 INPLACE_ADD
             32 STORE_FAST               0 (s)
             35 JUMP_ABSOLUTE           19
        >>   38 POP_BLOCK
        >>   39 LOAD_CONST               0 (None)
             42 RETURN_VALUE
>>> dis.dis(q)
  2           0 BUILD_LIST               0
              3 STORE_FAST               0 (s)

  3           6 SETUP_LOOP              33 (to 42)
              9 LOAD_GLOBAL              0 (range)
             12 LOAD_CONST               1 (100)
             15 CALL_FUNCTION            1
             18 GET_ITER
        >>   19 FOR_ITER                19 (to 41)
             22 STORE_FAST               1 (i)

  4          25 LOAD_FAST                0 (s)
             28 LOAD_CONST               2 ('c')
             31 BUILD_LIST               1
             34 BINARY_ADD
             35 STORE_FAST               0 (s)
             38 JUMP_ABSOLUTE           19
        >>   41 POP_BLOCK
        >>   42 LOAD_CONST               0 (None)
             45 RETURN_VALUE

The loops are nearly identical. The comparison amounts to CALL_FUNCTION+POP_TOP vs. INPLACE_ADD+STORE_FAST vs. BUILD_LIST+BINARY_ADD+STORE_FAST. However, I can't give a more low-level explanation than that -- I just can't find costs of python bytecodes on the Net. However, you might get some inspiration from looking at Doug Hellmann's Python Module of the Week posting on dis.

hughdbrown 2009-08-29 01:27:37

'dis' module is interesting.

mshsayem 2009-08-29 09:08:07

I have put some more in my answer(see below). Please have a look and comment.

mshsayem 2009-09-09 05:38:10

Answer 9

+12 A:

Some of us Python committers, I believe mostly Rigo and Hettinger, went out of their way (on the way to 2.5 I believe) to optimize some special cases of the alas-far-too-common s += something blight, arguing that it was proven that beginners will never be covinced that ''.join is the right way to go and the horrible slowness of the += might be giving Python a bad name. Others of us weren't that hot, because they just couldn't possibly optimize every occurrence (or even just a majority of them) to decent performance; but we didn't feel hotly enough on the issue to try and actively block them.

I believe this thread proves we should have opposed them more sternly. As it is now, they optimized += in a certain hard-to-predict subset of cases to where it can be maybe 20% faster for particular stupid cases than the proper way (which IS still ''.join) -- just a perfect way to trap beginners into pursuing those irrelevant 20% gains by using the wrong idiom... at the cost, once in a while and from their POV out of the blue, of being hit with a performance loss of 200% (or more, since non-linear behavior IS still lurking there just outside of the corners that Hettinger and Rigo prettied up and put flowers in;-) -- one that MATTERS, one that WILL make them miserable. This goes against the grain of Python's "ideally only one obvious way to do it" and it feels to me like we, collectively, have lain a trap for beginners -- the best kind, too... those who don't just accept what they're told by their "betters", but inquisitively go and question and explore.

Ah well -- I give up. OP, @mshsayem, go ahead, use += everywhere, enjoy your irrelevant 20% speedups in trivial, tiny, irrelevant cases, and you'd better enjoy them to the hilt -- because one day, when you can't see it coming, on an IMPORTANT, LARGE operation, you'll be hit smack in the midriff by the oncoming trailer truck of a 200% slowdown (unless you get unlucky and it's a 2000% one;-). Just remember: if you ever feel that "Python is horribly slow", REMEMBER, more likely than not it's one of your beloved loops of += turning around and biting the hand that feeds it.

For the rest of us -- those who understand what it means to say We should forget about small efficiencies, say about 97% of the time, I'll keep heartily recommending ''.join, so we all can sleep in all tranquility and KNOW we won't be hit with a superlinear slowdown when we least expect and least can afford you. But for you, Armin Rigo, and Raymond Hettinger (the last two, dear personal friends of mine, BTW, not just co-commiters;-) -- may your += be smooth and your big-O's never worse than N!-)

So, for the rest of us, here's a more meaningful and interesting set of measurements:

$ python -mtimeit -s'r=[str(x)*99 for x in xrange(100,1000)]' 's="".join(r)'
1000 loops, best of 3: 319 usec per loop

900 strings of 300 chars each, joining the list directly is of course fastest, but the OP is terrified about having to do appends before then. But:

$ python -mtimeit -s'r=[str(x)*99 for x in xrange(100,1000)]' 's=""' 'for x in r: s+=x'
1000 loops, best of 3: 779 usec per loop
$ python -mtimeit -s'r=[str(x)*99 for x in xrange(100,1000)]' 'z=[]' 'for x in r: z.append(x)' '"".join(z)'
1000 loops, best of 3: 538 usec per loop

...with a semi-important amount of data (a very few 100's of KB -- taking a measurable fraction of a millisecond every which way), even plain good old .append is alread superior. In addition, it's obviously and trivially easy to optimize:

$ python -mtimeit -s'r=[str(x)*99 for x in xrange(100,1000)]' 'z=[]; zap=z.append' 'for x in r: zap(x)' '"".join(z)'
1000 loops, best of 3: 438 usec per loop

shaving another tenths of a millisecond over the average looping time. Everybody (at least everybody who's totally obsessed abound performance) obviously knows that HOISTING (taking OUT of the inner loop a repetitive computation that would be otherwise performed over and over) is a crucial technique in optimization -- Python doesn't hoist on your behalf, so you have to do your own hoisting in those rare occasions where every microsecond matters.

Alex Martelli 2009-08-29 02:47:39

It's not that I have hatred on 'join' or unusual passion on '+'. (Actually, in most case I use list comprehension and 'join'). I just wanted the explanation of their behaviors in the mentioned codes.

mshsayem 2009-08-29 09:14:31

Great question/answer, this also shows that people who say "it doesn't matter that we add one more undocumented behavior to CPython because nobody will ever rely on it" don't know what they are talking about.

ilya n. 2009-08-29 09:23:01

Answer 10

A:

I have figured out the answer from the answers posted here by experts. Python string concatenation (and timing measurements) depends on these (as far as I've seen):

Number of concatenations
Average length of strings
Number of function callings

I have built a new code that relates these. Thanks to Peter S Magnusson, sepp2k, hughdbrown, David Wolever and others for indicating important points I had missed earlier. Also, in this code I might have missed something. So, I highly appreciate any replies pointing our errors, suggestions, criticisms etc. After all, I am here for learning. Here is my new code:

from timeit import timeit

noc = 100
tocat = "a"
def f_call():
    pass

def loop_only():
    for i in range(noc):
        pass

def concat_method():
    s = ''
    for i in range(noc):
        s = s + tocat

def list_append():
    s=[]
    for i in range(noc):
        s.append(tocat)
    ''.join(s)

def list_append_opt():
    s = []
    zap = s.append
    for i in range(noc):
        zap(tocat)
    ''.join(s)

def list_comp():
    ''.join(tocat for i in range(noc))

def concat_method_buildup():
    s=''

def list_append_buildup():
    s=[]

def list_append_opt_buildup():
    s=[]
    zap = s.append

def function_time(f):
    return timeit(f,number=1000)*1000

f_callt = function_time(f_call)

def measure(ftuple,n,tc):
    global noc,tocat
    noc = n
    tocat = tc
    loopt = function_time(loop_only) - f_callt
    buildup_time = function_time(ftuple[1]) -f_callt if ftuple[1] else 0
    total_time = function_time(ftuple[0])
    return total_time, total_time - f_callt - buildup_time - loopt*ftuple[2]

functions ={'Concat Method\t\t':(concat_method,concat_method_buildup,True),
            'List append\t\t\t':(list_append,list_append_buildup,True),
            'Optimized list append':(list_append_opt,list_append_opt_buildup,True),
            'List comp\t\t\t':(list_comp,0,False)}

for i in range(5):
    print("\n\n%d concatenation\t\t\t\t10'a'\t\t\t\t 100'a'\t\t\t1000'a'"%10**i)
    print('-'*80)
    for (f,ft) in functions.items():
        print(f,"\t|",end="\t")
        for j in range(3):
            t = measure(ft,10**i,'a'*10**j)
            print("%.3f %.3f |" % t,end="\t")
        print()

And here is what I have got. [In the time column two times (scaled) are shown: first one is the total function execution time, and the second time is the actual(?) concatenation time. I have deducted the function calling time, function buildup time(initialization time), and iteration time. Here I am considering a case where it can't be done without loop (say more statement inside).]

1 concatenation                 1'a'                  10'a'               100'a'
-------------------     ----------------------  -------------------  ----------------
List comp               |   2.310 2.168       |  2.298 2.156       |  2.304 2.162
Optimized list append   |   1.069 0.439       |  1.098 0.456       |  1.071 0.413
Concat Method           |   0.552 0.034       |  0.541 0.025       |  0.565 0.048
List append             |   1.099 0.557       |  1.099 0.552       |  1.094 0.552


10 concatenations                1'a'                  10'a'               100'a'
-------------------     ----------------------  -------------------  ----------------
List comp               |   3.366 3.224       |  3.473 3.331       |  4.058 3.916
Optimized list append   |   2.778 2.003       |  2.956 2.186       |  3.417 2.639
Concat Method           |   1.602 0.943       |  1.910 1.259       |  3.381 2.724
List append             |   3.290 2.612       |  3.378 2.699       |  3.959 3.282


100 concatenations               1'a'                  10'a'               100'a'
-------------------     ----------------------  -------------------  ----------------
List comp               |   15.900 15.758     |  17.086 16.944     |  20.260 20.118
Optimized list append   |   15.178 12.585     |  16.203 13.527     |  19.336 16.703
Concat Method           |   10.937 8.482      |  25.731 23.263     |  29.390 26.934
List append             |   20.515 18.031     |  21.599 19.115     |  24.487 22.003


1000 concatenations               1'a'                  10'a'               100'a'
-------------------     ----------------------  -------------------  ----------------
List comp               |   134.507 134.365   |  143.913 143.771   |  201.062 200.920
Optimized list append   |   112.018 77.525    |  121.487 87.419    |  151.063 117.059
Concat Method           |   214.329 180.093   |  290.380 256.515   |  324.572 290.720
List append             |   167.625 133.619   |  176.241 142.267   |  205.259 171.313


10000 concatenations              1'a'                  10'a'               100'a'
-------------------     ----------------------  -------------------  ----------------
List comp               |   1309.702 1309.560 |  1404.191 1404.049 |  2912.483 2912.341
Optimized list append   |   1042.271 668.696  |  1134.404 761.036  |  2628.882 2255.804
Concat Method           |   2310.204 1941.096 |  2923.805 2550.803 |  STUCK    STUCK
List append             |   1624.795 1251.589 |  1717.501 1345.137 |  3182.347 2809.233

To sum up all these I have made this decisions for me:

If you have a string list available, string 'join' method is best and fastest.
If you can use list comprehension, that's the easiest and fast as well.
If you need 1 to 10 concatenation (average) with length 1 to 100, list append, '+' both takes same (almost, note that times are scaled) time.
Optimized list append seems very good in most situation.
When #concatenation or string length rises, '+' starts to take significantly more and more time. Note that, for 10000 concatenations with 100'a' my PC is stuck!
If you use list append and 'join' always, you are safe all time (pointed by Alex Martelli).
But in some situation say, where you need to take user input and print 'Hello user's world!', it is simplest to use '+'. I think building a list and join for this case like x = input("Enter user name:") and then x.join(["Hello ","'s world!"]) is uglier than "Hello %s's world!"%x or "Hello " +x+ "'s world"
Python 3.1 has improved concatenation performance. But, in some implementation like Jython, '+' is less efficient.
Premature optimization is the root of all evil (experts' saying). Most of the time you do not need optimization. So, don't waste time in aspiration for optimization (unless you are writing a big or computational project where every micro/milli second counts.
Use these information and write in whatever way you like taking circumstances under consideration.
If you really need optimization , use a profiler, find the bottlenecks and try to optimize those.

Finally, I say it again that I am a noob and trying to learn python more deeply. So, it is not unusual that there will be mistakes (error) in my observations. So, comment on this and suggest me if I am taking a wrong route. Thanks to all for participating.

mshsayem 2009-09-07 18:07:11

ansaurus

tags:

views:

answers:

Python string 'join' is faster(?) than '+', but what's wrong here?

related questions