views:

64

answers:

4

Is there a way to dynamically update the name of regex groups in Python?

For example, if the text is:

person 1: name1
person 2: name2
person 3: name3
...
person N: nameN

How would you name groups 'person1', 'person2', 'person3', ..., and 'personN' without knowing beforehand how many people there are?

A: 

No, but you can do something like this:

>>> import re
>>> p = re.compile('(?m)^(.*?)\\s*:\\s*(.*)$')
>>> text = '''person 1: name1
person 2: name2
person 3: name3
...
person N: nameN'''
>>> p.findall(text)

output:

[('person 1', 'name1'), ('person 2', 'name2'), ('person 3', 'name3'), ('person N', 'nameN')]
Bart Kiers
A: 

named capture groups and numbered groups (\1, \2, etc.) cannot be dynamic, but you can achieve the same thing with findall:

re.findall(pattern, string[, flags])

Return all non-overlapping matches of pattern in string, as a list of strings. The string is scanned left-to-right, and matches are returned in the order found. If one or more groups are present in the pattern, return a list of groups; this will be a list of tuples if the pattern has more than one group. Empty matches are included in the result unless they touch the beginning of another match.

jspcal
A: 

Regexes in Python (and I'm pretty certain that that's true for regexes in general) don't allow for an arbitrary number of matches. You can either capture a repeated match in its entirety (by placing capturing parentheses around a repeated group) or capture the last match in a series of matches (by repeating a capturing group). This is independent of whether these are named or numbered capturing groups.

You need to do this programmatically by iterating over all matches in a string, like

for match in re.findall(pattern, string):
    do_something(match)
Tim Pietzcker
A: 

judging from your accepted answer, there's no need for regex

p="""
person 1: name1
person 2: name2
person 3: name3
person N: nameN
"""

ARR=[]
for item in p.split("\n"):
    if item:
        s=item.split(":")
        ARR.append(s)
print ARR

output

$ ./python.py
[['person 1', ' name1'], ['person 2', ' name2'], ['person 3', ' name3'], ['person N', ' nameN']]
ghostdog74