ansaurus

Question

Effective way to iteratively append to a string in Python?

Answer 1

+4 A:

You don't want to use re.split?

import re
re.split("[,; ]+", "coucou1 ,   coucou2;coucou3")

poulejapon 2009-03-17 07:08:48

Didn't think of that at all. Will consider it. Thanks!

Jace 2009-03-17 07:10:40

Answer 2

+3 A:

http://www.skymind.com/~ocrow/python_string/ talks about several ways of concatenating strings in Python and assesses their performance as well.

Vijay Dev 2009-03-17 07:08:57

This was what I needed. Thanks. cStringIO appears the best choice for my use case.

Jace 2009-03-17 07:15:23

Uh oh. cStringIO can't handle unicode strings.

Jace 2009-03-17 07:31:49

For what it's worth: I hacked on that testcase until it ran on my Python 2.5 install, and found Method 6 (feed ''.join a list comprehension) to be consistently fastest. 6 with generator expressions turned out *slower* but still second-fastest.

kquinn 2009-03-17 08:12:08

In order from fastest to slowest, the methods ended up being 6, 7, 4, 1, 5, 3, 2. (7 is 6 with the brackets dropped to make it a generator expression not list comprehension). I was unable to measure memory use.

kquinn 2009-03-17 08:15:08

Answer 3

+1 A:

You can split the input using re.split():

>>> splitchars=' \t|!?.;:"'
>>> re.split("[%s]" % splitchars, "one\ttwo|three?four")
['one', 'two', 'three', 'four']
>>>

EDIT: If your splitchars may contain special chars like ] or ^, you can use re.escpae()

>>> re.escape(splitchars)
'\\ \\\t\\|\\!\\?\\.\\;\\:\\"'
>>> re.split("[%s]" % re.escape(splitchars), "one\ttwo|three?four")
['one', 'two', 'three', 'four']
>>>

gimel 2009-03-17 07:25:01

That one's risky. What if splitchars starts with a '^' or contains a ']'?

Jace 2009-03-17 08:41:16

Escape them. See edit.

gimel 2009-03-17 08:56:16

Answer 4

+2 A:

You can use re.split

re.split('[\s|!\?\.;:"]', text)

However if the text is very large the resulting array may be consuming too much memory. Then you may consider re.finditer:

import re
def getwords(text, splitchars=' \t|!?.;:"'):
  words_iter = re.finditer(
    "([%s]+)" % "".join([("^" + c) for c in splitchars]),
    text)
  for word in words_iter:
    yield word.group()

# a quick test
s = "a:b cc? def...a||"
words = [x for x in getwords(s)]
assert ["a", "b", "cc", "def", "a"] == words, words

Jiayao Yu 2009-03-17 07:36:10

ansaurus

tags:

views:

answers:

Effective way to iteratively append to a string in Python?

related questions