ansaurus

Question

Find out number of capture groups in Python regular expressions

Answer 1

A:

First of all if you only need the first result of re.findall it's better to just use re.search that returns a match or None.

For the groups number you could count the number of open parenthesis '(' except those that are escaped by '\'. You could use another regex for that:

def num_of_groups(regexp):
    rg = re.compile(r'(?<!\\)\(')
    return len(rg.findall(regexp))

Note that this doesn't work if the regex contains non-capturing groups and also if '(' is escaped by using it as '[(]'. So this is not very reliable. But depending on the regexes that you use it might help.

rslite 2008-09-24 13:21:55

Answer 2

A:

The lastindex property of the match object should be what you are looking for. See the re module docs.

agnul 2008-09-24 13:22:27

If no match is found, I don't have a match object. Plus, I don't think that's what lastindex does.

itsadok 2008-09-25 08:04:04

Answer 3

+2 A:

Something from inside sre_parse might help.

At first glance, maybe something along the lines of:

>>> import sre_parse
>>> sre_parse.parse('(\d)\d(\d)')
[('subpattern', (1, [('in', [('category', 'category_digit')])])), 
('in', [('category', 'category_digit')]), 
('subpattern', (2, [('in', [('category', 'category_digit')])]))]

I.e. count the items of type 'subpattern':

import sre_parse

def count_patterns(regex):
    """
    >>> count_patterns('foo: \d')
    0
    >>> count_patterns('foo: (\d)')
    1
    >>> count_patterns('foo: (\d(\s))')
    1
    """
    parsed = sre_parse.parse(regex)
    return len([token for token in parsed if token[0] == 'subpattern'])

Note that we're only counting root level patterns here, so the last example only returns 1. To change this, tokens would need to searched recursively.

miracle2k 2008-09-24 13:23:34

Answer 4

A:

Might be wrong, but I don't think there is a way to find the number of groups that would have been returned had the regex matched. The only way I can think of to make this work the way you want it to is to pass the number of matches your particular regex expects as an argument.

To clarify though: When findall succeeds, you only want the first match to be returned, but when it fails you want a list of empty strings? Because the comment seems to show all matches being returned as a list.

Adam Bellaire 2008-09-24 13:25:54

Answer 5

A:

Using your code as a basis:

def groups(regexp, s):
    """ Returns the first result of re.findall, or an empty default

    >>> groups(r'(\d)(\d)(\d)', '123')
    ('1', '2', '3')
    >>> groups(r'(\d)(\d)(\d)', 'abc')
    ('', '', '')
    """
    import re
    m = re.search(regexp, s)
    if m:
        return m.groups()
    return ('',) * len(m.groups())

Will Boyce 2008-09-24 14:12:57

This will throw an exception when no match is found

itsadok 2008-09-25 07:57:36

Answer 6

+4 A:

def num_groups(regex):
    pattern = re.compile(r"(?<!\\)(?:\\\\)*(?:\[(?:\\.|[^\\\]])*\]|(\()(?!\?(?!P<)))")
    return len([ 1 for x in re.finditer(pattern, regex) if x.group(1) ])

It looks for unescaped '[' or '('. For '[' it looks for the next unescaped ']'. The '(' can't be followed by a '?', unless that is followed by 'P<'. (Named groups.) It then filters for the capturing groups, and counts them.

Also looking for character classes is necessary, because '(' can appear unescaped inside them. Using look-arounds to detect them is not possible, since look-behinds need to be of fixed length.

EDIT: (4 months later)

Too simple!

def num_groups(regex):
    return re.compile(regex).groups

MizardX 2008-09-25 21:18:49

+1 for the edited version :-)

Carl Meyer 2009-01-31 15:50:10

ansaurus

tags:

views:

answers:

Find out number of capture groups in Python regular expressions

related questions