ansaurus

Question

Maximum match length of a regular expression

Answer 1

+5 A:

import invRegex
data='foo(bar{2,3}|potato)'    
print(list(invRegex.invert(data)))
# ['foobarr', 'foobarrr', 'foopotato']    
print(max(map(len,invRegex.invert(data))))
# 9

Another alternative is to use ipermute from this module.

import inverse_regex
data='foo(bar{2,3}|potato)'
print(list(inverse_regex.ipermute(data)))
# ['foobarr', 'foobarrr', 'foopotato']
print(max(map(len,inverse_regex.ipermute(data))))
# 9

unutbu 2010-10-31 14:35:32

The latter falls apart on such a simple expression as `\w{1,10}`. It does suggest how to do this properly, though, using the `sre_parse` module, and the ipermute code is a good starting point. I'm not sure whether sre_parse is a public API, though; it doesn't seem to be documented, so be careful.

Glenn Maynard 2010-10-31 15:08:58

nice, I didn't know about this

bronzebeard 2010-10-31 16:12:48

Answer 2

+1 A:

Solved, I think. Thanks to unutbu for pointing me to sre_parse!

import sre_parse

def get_regex_max_match_len(regex):
    minlen, maxlen = sre_parse.parse(regex).getwidth()
    if maxlen >= sre_parse.MAXREPEAT: raise ValueError('unbounded regex')
    return maxlen

Results in:

>>> get_regex_max_match_len('foo((bar){2,3}|potato)')
12
>>> get_regex_max_match_len('.*')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 3, in get_regex_max_match_len
ValueError: unbounded regex

adw 2010-10-31 17:35:32

ansaurus

tags:

views:

answers:

Maximum match length of a regular expression

related questions