views:

184

answers:

3

As a result of the comments in my answer on this thread, I wanted to know what the speed difference is between the += operator and ''.join()

So what is the speed comparison between the two?

+3  A: 

This is what silly programs are designed to test :)

Use plus

import time

if __name__ == '__main__':
    start = time.clock()
    for x in range (1, 10000000):
        dog = "a" + "b"

    end = time.clock()
    print "Time to run Plusser = ", end - start, "seconds"

Output of:

Time to run Plusser =  1.16350010965 seconds

Now with join....

import time
if __name__ == '__main__':
    start = time.clock()
    for x in range (1, 10000000):
        dog = "a".join("b")

    end = time.clock()
    print "Time to run Joiner = ", end - start, "seconds"

Output Of:

Time to run Joiner =  21.3877386651 seconds

So on python 2.6 on windows, I would say + is about 18 times faster than join :)

bwawok
http://docs.python.org/library/timeit.html
SilentGhost
Your test only uses small string - which gives misleading output, because once you try with longer strings (see my answer) you'll probably see some different results. Also you should use xrange which is cheaper on memory, and you can also omit the `1` in your call to range.
Wayne Werner
Thanks for the tips :) I am still learning Python, more of a side hobby when I need a break from Java.
bwawok
this is broken on more than one place. check how much is `'a'.join('b')` - it is 'b'. What you meant is ''.join(['a', 'b']). Also, 'a'+'b' will likely be optimized to constant during compilation, so what are you testing then, assignment?
Nas Banov
+6  A: 

From: Efficient String Concatenation

Method 1:

def method1():
  out_str = ''
  for num in xrange(loop_count):
    out_str += `num`
  return out_str

Method 4:

def method4():
  str_list = []
  for num in xrange(loop_count):
    str_list.append(`num`)
  return ''.join(str_list)

Now I realise they are not strictly representative, and the 4th method appends to a list before iterating through and joining each item, but it's a fair indication.

String join is significantly faster then concatenation.

Why? Strings are immutable and can't be changed in place. To alter one, a new representation needs to be created (a concatenation of the two).

alt text

Dominic Bou-Samra
Well I was going to just answer this myself (hence the tag) but it looks like you beat me to the punch! +1, especially for the useful link!
Wayne Werner
@Wayne: *Useful link* is copied from the question that you've linked to!
SilentGhost
-1. There is no fixed ratio for the speed difference between string.join and + concatenation, because they have completely different **growth rate**/big oh complexity. As the number of string to concatenate grows, string.join will have greater and greater margin compared to string concatenation.
Lie Ryan
This out of date and wrong.
nate c
+1  A: 

It looks like for strings < ~40, += is faster, while longer strings quickly hit the worst-case O(N squared).

The times are as follows:

Iterations: 1,000,000       
String Length:  Time +=     Time ''.join()
1                0.953990        1.3280
4                1.233990        1.8140
6                1.516000        2.2810
12               2.250000        3.2500
80              15.530900       12.3750
222            101.797000       30.5160
443            238.063990       57.2030

And here is the code:

import time

def strcat(string):
    newstr = ''
    for char in string:
        newstr += string
    return newstr

def listcat(string):
    chars = []
    for char in string:
        chars.append(char)
    return ''.join(chars)

def test(fn, times, *args):
    start = time.time()
    for x in xrange(times):
        fn(*args)
    return time.time() - start

def testall():
    strings = ['a', 'long', 'longer', 'a bit longer', 
               '''adjkrsn widn fskejwoskemwkoskdfisdfasdfjiz  oijewf sdkjjka dsf sdk siasjk dfwijs''',
               '''this is a really long string that's so long
               it had to be triple quoted  and contains lots of
               superflous characters for kicks and gigles
               @!#(*_#)(*$(*!#@&)(*E\xc4\x32\xff\x92\x23\xDF\xDFk^%#$!)%#^(*#''',
              '''I needed another long string but this one won't have any new lines or crazy characters in it, I'm just going to type normal characters that I would usually write blah blah blah blah this is some more text hey cool what's crazy is that it looks that the str += is really close to the O(n^2) worst case performance, but it looks more like the other method increases in a perhaps linear scale? I don't know but I think this is enough text I hope.''']

    for string in strings:
        print "String of len:", len(string), "took:", test(listcat, 1000000, string), "seconds"
    for string in strings:
        print "String of len:", len(string), "took:", test(strcat, 1000000, string), "seconds"

testall()
Wayne Werner