views:

543

answers:

6

Given this harmless little list:

>>> lst = ['o','s','s','a','m','a']

My goal is to pythonically concatenate the little devils using one of the following ways:

A. plain ol' string function to get the job done, short, no imports

>>> ''.join(lst)
'ossama'

B. lambda, lambda, lambda

>>> reduce(lambda x, y: x + y, lst)
'ossama'

C. globalization (do nothing, import everything)

>>> import functools, operator
>>> functools.reduce(operator.add, lst)
'ossama'

Please suggest other pythonic ways to achieve this magnanimous task.

Please rank (pythonic level) and rate solutions giving concise explanations.

In this case, is the most pythonic solution the best coding solution?

+30  A: 
''.join(lst)

the only pythonic way:

  • clear (that what all the big boys do and what they expect to see),
  • simple (no additional imports needed, stable across all versions),
  • fast (written in C) and
  • concise (on an empty string join elements of iterable!).
SilentGhost
While the reduce based solutions are elegant and the functional programmer in me appreciates them, I must agree that join() is indeed the only pythonic solution.
liwp
SilentGhost is right. Strings have a join method that accepts iterables, so using anything else is not pythonic.
stefanw
So -- people, pls, check the "bytearray" native type my answer bellow). Join is ubeatable up to python 2.5, though.
jsbueno
THE way, without a doubt
Khelben
+15  A: 

Have a look at Guido's essay on python optimization, it covers converting lists of numbers to strings. Unless you have a good reason to do otherwise, use the join example.

Dana the Sane
OK, guys, now let's upvote this to 10 so SilentGhost gets his Populist badge :)
Tim Pietzcker
+1  A: 

Here's the least Pythonic way:

out = ""
for x in range(len(lst)):
  for y in range(len(lst)):
    if x + y == len(lst)-1:
        out = lst[y] + out
Eli
I bet there are ways to reduce pythonicity in that. ;)
jellybean
Not really what he asked for but I see your point.....:-)
Ben Hughes
+4  A: 

I myself use the "join" way, but from python 2.6 there is a base type that is little used: bytearray.

Bytearrays can be incredible useful -- for string containing texts, since the best thing is to have then in unicode, the "join" way is the way to go -- but if you are dealing with binary data instead, bytearrays can be both more pythonic and more efficient:

>>> lst = ['o','s','s','a','m','a']
>>> a = bytearray(lst)
>>> a
bytearray(b'ossama')
>>> print a
ossama

it is a built in data type: no imports needed - just use then -- and you can use a bytearray isntead of a list to start with - so they should be more efficinet than the "join", since there is no data copying to get the string representation for a bytearray.

jsbueno
hmm..just checked: the "bytearray" constructor can get even an unicode string and an "encoding" argument so it can deal with unicode as well. eg. : a = bytearray(u"déja-vu", encoding="utf8")
jsbueno
@jsbueno: except that bytearrays are neither pythonic nor efficient. They're intended for completely different purpose, and their behaviour in py2.6 is not the same as in py3k.
SilentGhost
@SilentGhost: Thank you for your feedback - where can I read more about it? And even if they aren't as efficient as they mightlookat first, I doubt "join" could be faster simply due to the creation of a new strign object per part to concatenate. (I intend to do some benchmarking later today)
jsbueno
@jsbueno: http://docs.python.org/3.1/library/functions.html#bytearray
SilentGhost
+2  A: 

Great answer from SilenGhost BUT, just a few words about the presented reduce "alternative"

Unless you've got a very very VERY good reason to concatenate strings using + or operator.add (the most frequent one, that you've got few, fixed number of strings), you should use always join.

Just because each + generates a new string which is the concatenation of two strings, unless join that only generates one final string. So, imagine you've got 3 strings:

A + B + C
-->
D = A + B
final = D + C

Ok, doesn't seems not much, but you've got to reserve memory for D. Also, due python use of strings, generating a new, intermediate, string, it's somehow expensive...

Now, with 5 strings

A + B + C + D + E
-->
F = A + B
G = F + C
H = G + D
final = H + E

Assuming the best scenario (if we do (A+B) + (C+D) + E, we'll end having three intermediate strings at the same time on memory), that's generating 3 intermediate strings... You've got to generate a new python object, reserve memory space, release the memory a few times... Also the overhead of calling a Python function (that is not small)

Now think of it with 200 strings. We'll end up with a ridiculous big number of intermediate strings, each of one consuming combining quite a lot time on being a complete list over python, and calling a lot of operator.add functions, each with its overhead... Even if you use reduce functions, it won't help. It's a problem that has to be managed with a different approach: join, which only generates ONE complete python string, the final one and calls ONE python function.

(Of course, join, or other similar, specialized function for arrays)

Khelben
+9  A: 

Of course it's join. How do I know? Let's do it in a really stupid way:
If the problem was only adding 2 strings, you'd most likely use str1 + str2. What does it take to get that to the next level? Instinctively, for most (I think), will be to use sum. Let's see how that goes:

In [1]: example = ['a', 'b', 'c']
In [2]: sum(example, '')
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython console> in <module>()
TypeError: sum() can't sum strings [use ''.join(seq) instead]

Wow! Python simply told me what to use! :)

abyx