views:

1090

answers:

6

There are a lot of articles around the web concerning python performance, the first thing you read: concatenating strings should not be done using '+': avoid s1+s2+s3, instead use str.join

I tried the following: concatenating two strings as part of a directory path: three approaches:

  1. '+' which i should not do
  2. str.join
  3. os.path.join

Here is my code:

import os,time

s1='/part/one/of/dir'
s2='part/two/of/dir'
N=10000

t=time.clock()
for i in xrange(N):
    s=s1+os.sep+s2
print time.clock()-t

t=time.clock()
for i in xrange(N):
    s=os.sep.join((s1,s2))
print time.clock()-t

t=time.clock()
for i in xrange(N):
    s=os.path.join(s1,s2)
print time.clock()-t

Here the results (python 2.5 WinXP)

0.0182201927899
0.0262544541275
0.120238186697

Shouldn't it be exactly the other way round ?

+3  A: 

It is true you should not use '+'. Your example is quite special, try the same code with:

s1='*'*100000
s2='+'*100000

Then the second version (str.join) is much faster.

RSabet
+9  A: 

Most of the performance issues with string concatenation are ones of asymptotic performance, so the differences become most significant when you are concatenating many long strings. In your sample, you are performing the same concatenation many times. You aren't building up any long string, and it may be that the python interpreter is optimizing your loops. This would explain why the time increases when you move to str.join and path.join - they are more complex functions that are not as easily reduced. (os.path.join does a lot of checking on the strings to see if they need to be rewritten in any way before they are concatenated. This sacrifices some performance for the sake of portability.)

By the way, since file paths are not usually very long, you almost certainly want to use os.path.join for the sake of the portability. If the performance of the concatenation is a problem, you're doing something very odd with your filesystem.

+5  A: 

Shouldn't it be exactly the other way round ?

Not necessarily. I don't know the internals of Python well enough to comment specifically but some common observations are that your first loop uses a simple operator + which is probly implemented as a primitive by the runtime. In contrast, the other loops first have to resolve a module name, resolve the variable/class found there and then call a member function on that.

Another note is that your loop might simply be too small to yield significant numbers. Considering your low overall running time, this probably makes your tests useless.

Also, your test case is highly specialized on two short strings. Such cases never give a clear picture of edge case performance.

Konrad Rudolph
+5  A: 

The advice is about concatenating a lot of strings.

To compute s = s1 + s2 + ... + sn,

1) using +. A new string s1+s2 is created, then a new string s1+s2+s3 is created,..., etc, so a lot of memory allocation and copy operations is involved. In fact, s1 is copied n-1 times, s2 is copied n-2 time, ..., etc.

2) using "".joing([s1,s2,...,sn]). The concatenation is done in one pass, and each char in the strings is copied only once.

In your code, join is called on each iteration, so it's just like using +. The correct way is collect the items in an array, then call join on it.

+1  A: 

String concatenation (+) has an optimized implementation on CPython. But this may not be the case on other architectures like Jython or IronPython. So when you want your code to performe well on these interpreters you should use the .join() method on strings. os.path.join() is specifically meant to join filesystem paths. It takes care of different path separators, too. This would be the right way to build a file name.

unbeknown
A: 

I would like to add a link to the python wiki, where there are notes regarding string concatenation and also that "this section is somewhat wrong with python2.5. Python 2.5 string concatenation is fairly fast".

I believe that string concatenation had a big improvement since 2.5, and that although str.join is still faster (specially for big strings), you will not see as much improvement as in older Python versions.

http://wiki.python.org/moin/PythonSpeed/PerformanceTips#StringConcatenation

JPCosta