tags:

views:

131

answers:

5

I'm trying to do a simple regex split in Python. The string is in the form of FooX where Foo is some string and X is an arbitrary integer. I have a feeling this should be really simple, but I can't quite get it to work.

On that note, can anyone recommend some good Regex reading materials?

A: 

Assuming you want to split between the "Foo" and the number, you'd want something like:

r/(?<=\D)(?=\d)/

Which will match at a point between a nondigit and a digit, without consuming any characters in the split.

Anon.
Great idea, but won't work, at least in Python. It ignores lookarounds in regexes that do not match any characters.
Max Shawabkeh
...seriously? I wonder what the rationale for that behaviour is.
Anon.
@Max S. see my edit...it appears to have some viability in python after all..
AJ
Hmm, it seems to ignore them for `split()` but it does work for `search()`. Strange.
Max Shawabkeh
+1  A: 

Using groups:

import re

m=re.match('^(?P<first>[A-Za-z]+)(?P<second>[0-9]+)$',"Foo9")
print m.group('first')
print m.group('second')

Using search:

import re

s='Foo9'
m=re.search('(?<=\D)(?=\d)',s)
first=s[:m.start()]
second=s[m.end():]

print first, second
AJ
+4  A: 

You can't use split() since that has to consume some characters, but you can use normal matching to do it.

>>> import re
>>> r = re.compile(r'(\D+)(\d+)')
>>> r.match('abc444').groups()
('abc', '444')
Max Shawabkeh
A: 
>>> import re
>>> s="gnibbler1234"
>>> re.findall(r'(\D+)(\d+)',s)[0]
('gnibbler', '1234')

In the regex, \D means anything that is not a digit, so \D+ matches one or more things that are not digits.

Likewise \d means anything that is a digit, so \d+ matches one or more digits

gnibbler
+1  A: 

Keeping it simple:

>>> import re
>>> a = "Foo1String12345"
>>> re.split(r'(\d+)$', a)[0:2]
['Foo1String', '12345']
MikeyB
simple... and allowing for digits in the "arbitrary string" :-p
MikeyB