tags:

views:

1448

answers:

4

Is there any efficient mass string concatenation method in Python (like StringBuilder in C# or StringBuffer in Java)? I found following methods here:

  • Simple concatenation using '+'
  • Using UserString from MutableString module
  • Using character array and the array module
  • Using string list and join method
  • Using cStringIO from StringIO module

But what do you experts use or suggest, and why?

+8  A: 

''.join(sequenceofstrings) is what usually works best -- simplest and fastest.

Alex Martelli
But to do that, shouldn't I build a list first? Building an empty list, then appending many many strings into it and then joining is better than simply concatenating strings? Why? Please explain...I am a newB..
mshsayem
@mshsayem, in Python a sequence can be any enumerable object, even a function.
Nick D
It does look grim though
Aiden Bell
I absolutely love the `''.join(sequence)` idiom. It's especially useful to produce comma-separated lists: `', '.join([1, 2, 3])` gives the string `'1, 2, 3'`.
Andrew Keeton
@Andrew .. that is indeed useful.
Aiden Bell
@Nick D: Please explain more. Best of all, please give a code example which concatenates strings in efficient way ...thanks
mshsayem
@mshsayem: `"".join(chr(x) for x in xrange(65,91))` --- in this case, the argument to join is an iterator, created through a generator expression. There's no temporary list that gets constructed.
balpha
No matter how you obtain the strings, ''.join is a good way to put them together -- possibly via an intermediate list or tuple. Tell us in what form you get them and we can help more!
Alex Martelli
@Alex Martelli: Supppose, I want to build a dyanmic javascript that to be put on a requested page and the javascript may vary depending on some conditions and the size of the javascript can be large. In that case should I build a list/tuple of strings first? Or, just concatenate?
mshsayem
@mshayem, looks like pieces of that JS may come at several different moments in your processing, and not necessarily in order, so I'd definitely go with a list. "large" is a relative terms: surely you're not going to inject many megabytes of javascript into a poor HTML page, are you?-) Even with tens of megabytes, an intermediate list and ''.join at the end would still perform just fine, anyway.
Alex Martelli
@balpha: and yet the generator version is slower than the list comprehension version:C:\temp>python -mtimeit "''.join(chr(x) for x in xrange(65,91))"100000 loops, best of 3: 9.71 usec per loopC:\temp>python -mtimeit "''.join([chr(x) for x in xrange(65,91)])"100000 loops, best of 3: 7.1 usec per loop
hughdbrown
@hughdbrown, yes, when you have free memory out the wazoo (typical timeit case) listcomp can be better optimized than genexp, often by 20-30%. When memory's tight things are different -- hard to reproduce in timeit, though!-)
Alex Martelli
Exactly. If someone is concerned about the efficiency of string concatenation, we're usually talking about loooong strings, i.e. higher memory usage; also memory /time is a classical tradeoff. As always, the only reliable answer is measuring *the particular use case* and optimizing *for the particular situation.*
balpha
+12  A: 

You may be interested in this: An optimiztion anecdote by Guido. Although it is worth remembering also that this is an old article and it predates the existence of things like ''.join (although I guess string.joinfields is more-or-less the same)

On the strength of that, the array module may be fastest if you can shoehorn your problem into it. But ''.join is probably fast enough and has the benefit of being idiomatic and thus easier for other python programmers to understand.

Finally, the golden rule of optimization: don't optimize unless you know need to, and measure rather than guessing.

You can measure different methods using the timeit module. That can tell you which is fastest, instead of random strangers on the internet making guesses.

John Fouhy
+1: Don't optimize until you know you need to.
S.Lott
+5  A: 

It depends on what you're doing.

After Python 2.5, string concatenation with the + operator is pretty fast. If you're just concatenating a couple of values, using the + operator works best:

>>> x = timeit.Timer(stmt="'a' + 'b'")
>>> x.timeit()
0.039999961853027344

>>> x = timeit.Timer(stmt="''.join(['a', 'b'])")
>>> x.timeit()
0.76200008392333984

However, if you're putting together a string in a loop, you're better off using the list joining method:

>>> join_stmt = """
... joined_str = ''
... for i in xrange(100000):
...   joined_str += str(i)
... """
>>> x = timeit.Timer(join_stmt)
>>> x.timeit(100)
13.278000116348267

>>> list_stmt = """
... str_list = []
... for i in xrange(100000):
...   str_list.append(str(i))
... ''.join(str_list)
... """
>>> x = timeit.Timer(list_stmt)
>>> x.timeit(100)
12.401000022888184

...but notice that you have to be putting together a relatively high number of strings before the difference becomes noticeable.

Jason Baker
he does mass concatenation
SilentGhost
I missed the mass part. :-)
Jason Baker
1) In your first measurement it's probably the list construction that takes the time. Try with a tuple. 2) CPython performs uniformly good, however other Python implementations perform way worse with + and +=
kaizer.se
A: 

this url has the comparisons of the different approaches along with some benchmarking:

http://skymind.com/~ocrow/python%5Fstring/

fengshaun