views:

104

answers:

3

Similar questions have been brought (good speed comparison there) on this same subject. Hopefully this question is different and updated to Python 2.6 and 3.0.

So far I believe the faster and most compatible method (among different Python versions) is the plain simple + sign:

text = "whatever" + " you " + SAY

But I keep hearing and reading it's not secure and / or advisable.

I'm not even sure how many methods are there to manipulate strings! I could count only about 4: There's interpolation and all its sub-options such as % and format and then there's the simple ones, join and +.

Finally, the new approach to string formatting, which is with format, is certainly not good for backwards compatibility at same time making % not good for forward compatibility. But should it be used for every string manipulation, including every concatenation, whenever we restrict ourselves to 3.x only?

Well, maybe this is more of a wiki than a question, but I do wish to have an answer on which is the proper usage of each string manipulation method. And which one could be generally used with each focus in mind (best all around for compatibility, for speed and for security).

Thanks.

edit: I'm not sure I should accept an answer if I don't feel it really answers the question... But my point is that all them 3 together do a proper job.

Daniel's most voted answer is actually the one I'd prefer for accepting, if not for the "note". I highly disagree with "concatenation is strictly using the + operator to concatenate strings" because, for one, join does string concatenation as well, and we can build any arbitrary library for that.

All current 3 answers are valuable and I'd rather having some answer mixing them all. While nobody volunteer to do that, I guess by choosing the one less voted (but fairly broader than THC4k's, which is more like a large and very welcomed comment) I can draw attention to the others as well.

+5  A: 

As a note: Really this is all about string construction and not concatenation, per se, as concatenation is strictly using the + operator to concatenate strings together one after the other.

  • + (concatenation) - generally inefficient but can be easier to read for some people, only use when readability is priority and performance is not (simple scripts, throwaway scripts, non-performance intensive code)
  • join (building a string from a sequence of strings) - use this when you have a sequence of strings that you need to join using a common character (or no character at all if you want to use the empty string '' to join on)
  • % and format (interpolation) - basically every other operation should use whichever one of these is appropriate, choose which operator/function is appropriate based on which version of Python you want to support for the lifetime of the code (use % for 2.x and format for 3.x)
Daniel DiPaolo
Well, my focus here is just about concatenation, I don't really want to go into formatting strings and all that. But fair enough, talking about `format` and `%` other functions, it would be better to say "construction" and I haven't even thought of that word. I don't know, you still think it's better if I change the title?
Cawas
As for concatenation speed, take a look at that link I gave for speed comparison. You'd be surprised. Plus, I don't really want to discuss design / readability here. I think in this case it's very subjective.
Cawas
+3  A: 

Using + is OK, but not if it's automated:

a + small + number + of + strings + "is pretty fast"

but this can be very slow:

s = ''
for line in anything:
   s += line 

Use this instead:

s = ''.join([line for line in anything])

There are pros and cons of use + vs '%s%line' - using + will fail here:

s = 'Error - unexpected string' + 42

Whether you want it to throw an exception, or silently do something unusual depends on your use.

wisty
But are you saying `join` performs better than `+` even for small strings? I'd risk to say almost every software will have almost every line of code with small strings concatenation... But people seem to talk like it is the other way around when talking about this subject.
Cawas
`s = ''.join([line for line in anything])` has a pointless loop to construct a pointless list -> `s=''.join(anything)`
THC4k
@Cawas, for a small number of strings, there's no real difference. For a large number (say 100) of small strings (when the final result is a big string), join is faster. Both should be very fast. Readability is more important.
wisty
@THC4k, it's not pointless if `anything` is an iterator. `''.join` wants a list, not an iterator.
wisty
+3  A: 

The problem with + for strings is the same as in many other languages: Each time you extend the string, it is copied. So to construct a single strings from 100 substrings, Python copies each of the 99 steps.

And that takes some time:

# join 100 pretty short strings
python -m timeit -s "s = ['pretty short'] * 100" "t = ''.join(s)"
100000 loops, best of 3: 4.18 usec per loop

# same thing, 6 times slower
python -m timeit -s "s = ['pretty short'] * 100" "t = ''" "for x in s:" " t+=x"
10000 loops, best of 3: 30 usec per loop
THC4k
Thanks for this clarification!
Cawas