views:

546

answers:

7

I've got a file whose format I'm altering via a python script. I have several camel cased strings in this file where I just want to insert a single space before the capital letter - so "WordWordWord" becomes "Word Word Word".

My limited regex experience just stalled out on me - can someone think of a decent regex to do this, or (better yet) is there a more pythonic way to do this that I'm missing?

+9  A: 

You could try:

>>> re.sub(r"(\w)([A-Z])", r"\1 \2", "WordWordWord")
'Word Word Word'
Greg Hewgill
re.sub(r"(\w)([A-Z])", r"\1 \2", "SorryIThinkYouMissedASpot")
ΤΖΩΤΖΙΟΥ
As a small improvement, [[:upper:]] should be used instead of [A-Z].
Tomalak
A: 

With regexes you can do this:

re.sub('([A-Z])', r' \1', str)

Of course, that will only work for ASCII characters, if you want to do Unicode it's a whole new can of worms :-)

Dan
re.sub('([A-Z])', r' \1', "Do we want a space before the D's of this phrase?")
ΤΖΩΤΖΙΟΥ
Ah, yes, good point. Looks like your's and Leonhard's solutions handle this correctly.
Dan
+13  A: 

If there are consecutive capitals, then Gregs result could not be what you look for, since the \w consumes the caracter in front of the captial letter to be replaced.

>>> re.sub(r"(\w)([A-Z])", r"\1 \2", "WordWordWWWWWWWord")
'Word Word WW WW WW Word'

A look-behind would solve this:

>>> re.sub(r"(?<=\w)([A-Z])", r" \1", "WordWordWWWWWWWord")
'Word Word W W W W W W Word'
Leonhard
Dan's answer is better and simpler :)
hayalci
@hayalci: re.sub('([A-Z])', r' \1', 'Really?')
ΤΖΩΤΖΙΟΥ
+1  A: 

Have a look at my answer on .NET - How can you split a “caps” delimited string into an array?

Edit: Maybe better to include it here.

re.sub(r'([a-z](?=[A-Z])|[A-Z](?=[A-Z][a-z]))', r'\1 ', text)

For example:

"SimpleHTTPServer" => ["Simple", "HTTP", "Server"]
MizardX
Your answer is probably what Electrons_Ahoy really wants; however, based on the phrasing of their question, it's not.
ΤΖΩΤΖΙΟΥ
+4  A: 

Perhaps shorter:

>>> re.sub(r"\B([A-Z])", r" \1", "DoIThinkThisIsABetterAnswer?")
ΤΖΩΤΖΙΟΥ
A: 

I agree that the regex solution is the easiest, but I wouldn't say it's the most pythonic.

How about:

text = 'WordWordWord'
new_text = ''
is_first_letter = True

for letter in text:
    if not is_first_letter and letter.isupper():
        new_text += ' ' + letter
    else:
        new_text += letter

    isFirstLetter = False
monkut
This has the same problem as Dan's - you'll get extra spaces before capitals even if they aren't needed.
Brian
True, i've edited it to add a flag... I admit it's a little cumbersome, but may be easier to remember than regex.
monkut
A: 

I think regexes are the way to go here, but just to give a pure python version without (hopefully) any of the problems ΤΖΩΤΖΙΟΥ has pointed out:

def splitCaps(s):
    result = []
    for ch, next in window(s+" ", 2):
        result.append(ch)
        if next.isupper() and not ch.isspace():
            result.append(' ')
    return ''.join(result)

window() is a utility function I use to operate on a sliding window of items, defined as:

import collections, itertools

def window(it, winsize, step=1):
    it=iter(it)  # Ensure we have an iterator
    l=collections.deque(itertools.islice(it, winsize))
    while 1:  # Continue till StopIteration gets raised.
        yield tuple(l)
        for i in range(step):
            l.append(it.next())
            l.popleft()
Brian