tags:

views:

112

answers:

7

I have a bunch of strings:

"10people"
"5cars"
..

How would I split this to?

['10','people']
['5','cars']

It can be any amount of numbers and text.

I'm thinking about writing some sort of regex - however I'm sure there's an easy way to do it in Python.

+8  A: 

Use the regex (\d+)([a-zA-Z]+).

import re
a = ["10people", "5cars"]
[re.match('^(\\d+)([a-zA-Z]+)$', x).groups() for x in a]

Result:

[('10', 'people'), ('5', 'cars')]
KennyTM
-1: Doesn't work on strings like `"10cars5toys"`. It even throws an exception: `AttributeError: 'NoneType' object has no attribute 'groups'` trying to do the list comprehension on a non-match.
Tim Pietzcker
@Tim: This is not even specified!
KennyTM
@Tim: why do you think that it needs to work for such strings? I don't see OP asking for this. I don't see you asking for this. It's not the problem that is being solved!
SilentGhost
The OP writes "It can be any amount of numbers and text". I don't seem to be the only one who has understood it that way. Perhaps the OP could clarify. In the meantime, I'll retract my vote.
Tim Pietzcker
+3  A: 
>>> re.findall("\d+|[a-zA-Z]+","10people")
['10', 'people']

>>> re.findall("\d+|[a-zA-Z]+","10people5cars")
['10', 'people', '5', 'cars']
S.Mark
+7  A: 
>>> re.findall('(\d+|[a-zA-Z]+)', '12fgsdfg234jhfq35rjg')
['12', 'fgsdfg', '234', 'jhfq', '35', 'rjg']
Ignacio Vazquez-Abrams
I would probably use \D instead of [a-zA-Z], which would basically split numbers and non-numbers apart. In other words '(\d+|\D+)'
Lasse V. Karlsen
A: 
>>> import re
>>> s = '10cars'
>>> m = re.match(r'(\d+)([a-z]+)', s)
>>> print m.group(1)
10
>>> print m.group(2)
cars
Dominic Rodger
A: 

If you are like me and goes long loops around to avoid regexpes justbecause they are ugly, here is a non-regex approach:

data = "5people10cars"

numbers = "".join(ch if ch.isdigit() else "\n" for ch in data).split()
names = "".join(ch if not ch.isdigit() else "\n" for ch in data).split()

final = zip (numbers, names)
jsbueno
Ugh. Regex's are a thing of beauty compared to that! ;-)
Wim Hollebrandse
no...this is beautifull...you should have seen me tryign to that in a single pass, which would involve dynamically declaring a class (with type() ) and instantiating it inside the generator expression) - _That_ was too ugly for even me to go ahead.
jsbueno
+1  A: 

In general, a split on /(?<=[0-9])(?=[a-z])|(?<=[a-z])(?=[0-9])/i separates a string that way.

Anonymous
+1, for the principle of defining split points using lookaround. However, this doesn't work in Python since `re.split()` never splits on empty pattern matches according to the documentation.
Tim Pietzcker
So insert a delimiter, then split on it: `re.sub(r'(?<=\d)(?=\D)|(?<=\D)(?=\d)','!SPLIT_ME!',s).split(r'!SPLIT_ME!')` ;)
Alan Moore
A: 

Piggybacking on jsbueno's idea, using str.translate, followed by split:

import string

allchars = ''.join(chr(i) for i in range(32,256))
digExtractTrans = string.maketrans(allchars, ''.join(ch if ch.isdigit() else ' ' for ch in allchars))
alpExtractTrans = string.maketrans(allchars, ''.join(ch if ch.isalpha() else ' ' for ch in allchars))

data = "5people10cars"
numbers = data.translate(digExtractTrans).split()
names = data.translate(alpExtractTrans).split()

You only need to create the translation tables once, then call translate and split as often as you want.

Paul McGuire