I'm trying to split a string up into words and punctuation, adding the punctuation to the list produced by the split.
For instance:
>>> c = "help, me"
>>> print c.split()
['help,', 'me']
What I really want the list to look like is:
['help', ',', 'me']
So, I want the string split at whitespace with the punctuation split from the words.
I've tried to parse the string first and then run the split:
>>> for character in c:
... if character in ".,;!?":
... outputCharacter = " %s" % character
... else:
... outputCharacter = character
... separatedPunctuation += outputCharacter
>>> print separatedPunctuation
help , me
>>> print separatedPunctuation.split()
['help', ',', 'me']
This produces the result I want, but is painfully slow on large files.
Is there a way to do this more efficiently?