views:

116

answers:

5

How to define a function that takes a string (sentence) and inserts an extra space after a period if the period is directly followed by a letter.

sent = "This is a test.Start testing!"
def normal(sent):
    list_of_words = sent.split()
    ...

This should print out

"This is a test. Start testing!"

I suppose I should use split() to brake a string into a list, but what next?

P.S. The solution has to be as simple as possible.

+1  A: 

Brute force without any checks:

>>> sent = "This is a test.Start testing!"
>>> k = sent.split('.')
>>> ". ".join(l)
'This is a test. Start testing!'
>>> 

For removing spaces:

>>> sent = "This is a test. Start testing!"
>>> k = sent.split('.')
>>> l = [x.lstrip(' ') for x in k]
>>> ". ".join(l)
'This is a test. Start testing!'
>>> 
pyfunc
This doesn't take into account the requirement that the period be followed by a letter.
kindall
+3  A: 

One-liner non-regex answer:

def normal(sent):
    return ".".join(" " + s if i > 0 and s[0].isalpha() else s for i, s in enumerate(sent.split(".")))

Here is a multi-line version using a similar approach. You may find it more readable.

def normal(sent):
    sent = sent.split(".")
    result = sent[:1]
    for item in sent[1:]:
        if item[0].isalpha():
            item = " " + item
        result.append(item)
    return ".".join(result)

Using a regex is probably the better way, though.

kindall
Cool! works good!) Thnks
Gusto
Very clever. I love that it avoids regular expressions. I don't, however, like it's level of complexity. I wish I could give a 1/2 vote.
Steven Rumbalski
Yeah, that would probably be much more readable as a traditional for loop. :-) In fact I will just go ahead and add such a version in an edit.
kindall
+1. Good update.
Steven Rumbalski
A: 

Improving pyfunc's answer:

sent="This is a test.Start testing!"

k=sent.split('.')

k='. '.join(k)

k.replace('. ','. ')

'This is a test. Start testing!'

AJ00200
+7  A: 

Use re.sub. Your regular expression will match a period (\.) followed by a letter ([a-zA-Z]). Your replacement string will contain a reference to the second group (\2), which was the letter matched in the regular expression.

>>> import re
>>> re.sub(r'\.([a-zA-Z])', r'. \1', 'This is a test.This is a test. 4.5 balloons.')
'This is a test. This is a test. 4.5 balloons'

Note the choice of [a-zA-Z] for the regular expression. This matches just letters. We do not use \w because it would insert spaces into a decimal number.

Steven Rumbalski
+1  A: 

Another regex-based solution, might be a tiny bit faster than Steven's (only one pattern match, and a blacklist instead of a whitelist):

import re
re.sub(r'\.([^\s])', r'. \1', some_string)
Dor
Good catch. I didn't need to make two groups, just one. But [^\s] would split on more than letters. Most importantly, decimal numbers would have spaces inserted, so I would stick with [a-zA-Z].
Steven Rumbalski
True that, point taken.
Dor