ansaurus

Question

Python: Split a string at uppercase letters

Answer 1

+16 A:

Unfortunately it's not possible to split on a zero-width match in Python. But you can use re.findall instead:

>>> import re
>>> re.findall('[A-Z][^A-Z]*', 'TheLongAndWindingRoad')
['The', 'Long', 'And', 'Winding', 'Road']
>>> re.findall('[A-Z][^A-Z]*', 'ABC')
['A', 'B', 'C']

Mark Byers 2010-02-17 00:04:44

Answer 2

+3 A:

import re
filter(None, re.split("([A-Z][^A-Z]*)", "TheLongAndWindingRoad"))

Gabe 2010-02-17 00:07:51

Answer 3

+8 A:

>>> import re
>>> re.findall('[A-Z][a-z]*', 'TheLongAndWindingRoad')
['The', 'Long', 'And', 'Winding', 'Road']

>>> re.findall('[A-Z][a-z]*', 'SplitAString')
['Split', 'A', 'String']

>>> re.findall('[A-Z][a-z]*', 'ABC')
['A', 'B', 'C']

If you want "It'sATest" to split to ["It's", 'A', 'Test'] change the rexeg to "[A-Z][a-z']*"

gnibbler 2010-02-17 00:14:03

+1: For first to get ABC working. I've also updated my answer now.

Mark Byers 2010-02-17 00:19:27

>>> re.findall('[A-Z][a-z]*', "It's about 70% of the Economy") ----->['It', 'Economy']

ChristopheD 2010-02-17 00:50:46

@ChristopheD. The OP doesn't say how to non-alpha characters should be treated.

gnibbler 2010-02-17 01:00:11

@gnibbler: true, but this current regex way also `drops` all regular (just plain alpha) words that do not start with an uppercase letter. I doubt that that was the intention of the OP.

ChristopheD 2010-02-17 12:21:43

Answer 4

+1 A:

Alternative solution (if you dislike explicit regexes):

s = 'TheLongAndWindingRoad'

pos = [i for i,e in enumerate(s) if e.isupper()]

parts = []
for j in xrange(len(pos)):
    try:
        parts.append(s[pos[j]:pos[j+1]])
    except IndexError:
        parts.append(s[pos[j]:])

print parts

ChristopheD 2010-02-17 00:37:13

Answer 5

+1 A:

A variation on @ChristopheD 's solution

s = 'TheLongAndWindingRoad'

pos = [i for i,e in enumerate(s+'A') if e.isupper()]
parts = [s[pos[j]:pos[j+1]] for j in xrange(len(pos)-1)]

print parts

pwdyson 2010-02-17 02:01:39

Answer 6

+4 A:

Here is an alternative regex solution. The problem can be reprased as "how do I insert a space before each uppercase letter, before doing the split":

>>> s = "TheLongAndWindingRoad ABC A123B45"
>>> re.sub( r"([A-Z])", r" \1", s).split()
['The', 'Long', 'And', 'Winding', 'Road', 'A', 'B', 'C', 'A123', 'B45']

This has the advantage of preserving all non-whitespace characters, which most other solutions do not.

Dave Kirby 2010-02-17 08:19:04

ansaurus

tags:

views:

answers:

Python: Split a string at uppercase letters

related questions