tags:

views:

519

answers:

3

Maybe I'm overlooking something in the Python re library but how can I get the start and end positions of for all matches of my pattern matches in a string?

For example for pattern r'[a-z]' on string 'a1b2c3d4' I'd want to get the positions where it finds each letter. (ideally I'd like to get the text of the match back too).

Any ideas?

+1  A: 

See if this helps Match Objects

EBGreen
+5  A: 
import re
p = re.compile("[a-z]")
for m in p.finditer('a1b2c3d4'):
    print m.start(), m.group()
Peter Hoffmann
+5  A: 

Taken from

http://www.amk.ca/python/howto/regex/

span() returns both start and end indexes in a single tuple. Since the match method only checks if the RE matches at the start of a string, start() will always be zero. However, the search method of RegexObject instances scans through the string, so the match may not start at zero in that case.

>>> p = re.compile('[a-z]+')
>>> print p.match('::: message')
None
>>> m = p.search('::: message') ; print m
<re.MatchObject instance at 80c9650>
>>> m.group()
'message'
>>> m.span()
(4, 11)

Combine that with:

In Python 2.2, the finditer() method is also available, returning a sequence of MatchObject instances as an iterator.

>>> p = re.compile( ... )
>>> iterator = p.finditer('12 drummers drumming, 11 ... 10 ...')
>>> iterator
<callable-iterator object at 0x401833ac>
>>> for match in iterator:
...     print match.span()
...
(0, 2)
(22, 24)
(29, 31)

you should be able to do something on the order of

for match in re.finditer(r'[a-z]', 'a1b2c3d4'):
   print match.span()
kanja