ansaurus

Question

How to split strings into text and number?

Answer 1

+6 A:

I would approach this by using re.match in the following way:

match = re.match(r"([a-z]+)([0-9]+)", 'foofo21', re.I)
if match:
    items = match.groups()
    # items is ("foo", "21")

Evan Fosmark 2009-01-09 23:12:01

you probably want \w instead of [a-z] and \d instead of [0-9]

Dan 2009-01-09 23:22:04

@Dan:Using \w is a poor choice as it matches all alphanumeric characters, not just a-z. So, the entire string would be caught in the first group.

Evan Fosmark 2009-01-09 23:30:09

Not if you match it ungreedy as I do in my answer.

PEZ 2009-01-09 23:46:18

What about upper case?

Bernard 2009-01-10 00:35:32

@Bernard, notice the `re.I` at the end. That makes case a non-issue.

Evan Fosmark 2009-01-10 00:56:36

You might get some false positives using this method. If you tried m = r.match("abc123def"), then m.groups() would get you ('abc', '123'). That's because re.match() matches from the beginning of a string but doesn't need to match the entire string.

eksortso 2009-01-10 01:24:55

If that's a concern, you can tack '\b' (IIRC) at the end, to specify that the match must end at a word boundary (or '$' to match the end of the string).

Jeff Shannon 2009-01-10 08:17:08

Answer 2

+5 A:

>>> r = re.compile("([a-zA-Z]+)([0-9]+)")
>>> m = r.match("foobar12345")
>>> m.group(1)
'foobar'
>>> m.group(2)
'12345'

So, if you have a list of strings with that format:

import re
r = re.compile("([a-zA-Z]+)([0-9]+)")
strings = ['foofo21', 'bar432', 'foobar12345']
print [r.match(string).groups() for string in strings]

Output:

[('foofo', '21'), ('bar', '432'), ('foobar', '12345')]

Federico Ramponi 2009-01-09 23:12:16

Answer 3

+1 A:

I'm always the one to bring up findall() =)

>>> strings = ['foofo21', 'bar432', 'foobar12345']
>>> [re.findall(r'(\w+?)(\d+)', s)[0] for s in strings]
[('foofo', '21'), ('bar', '432'), ('foobar', '12345')]

Note that I'm using a simpler (less to type) regex than most of the previous answers.

PEZ 2009-01-09 23:40:54

r'\w' matches '_'. I don't see '_' in the question.

J.F. Sebastian 2009-01-10 00:52:57

I don't see A-Z in the question. It says "text and numbers".

PEZ 2009-01-10 09:39:29

@PEZ: If you allow any text except numbers then your regexp should be r'(\D+)(\d+)'.

J.F. Sebastian 2009-01-10 13:33:32

\w makes the most sense

PEZ 2009-01-10 22:24:36

Answer 4

+3 A:

Yet Another Option:

>>> [re.split(r'(\d+)', s) for s in ('foofo21', 'bar432', 'foobar12345')]
[['foofo', '21', ''], ['bar', '432', ''], ['foobar', '12345', '']]

J.F. Sebastian 2009-01-10 00:54:09

Neat. Or even: [re.split(r'(\d+)', s)[0:2] for s in ...] getting rid of that extra empty string. Note though that compared with \w this is equivalent to [^|\d].

PEZ 2009-01-10 11:32:34

@PEZ: There may be more than one pair and an empty string may be at the begining of the list. You could remove empty strings with `[filter(None, re.split(r'(\d+)', s)) for s in ('foofo21','a1')]`

J.F. Sebastian 2009-01-10 19:47:36

Answer 5

+1 A:

>>> def mysplit(s):
...     head = s.rstrip('0123456789')
...     tail = s[len(head):]
...     return head, tail
... 
>>> [mysplit(s) for s in ['foofo21', 'bar432', 'foobar12345']]
[('foofo', '21'), ('bar', '432'), ('foobar', '12345')]
>>>

Mike 2009-01-10 06:17:25

ansaurus

tags:

views:

answers:

How to split strings into text and number?

related questions