views:

77

answers:

6

I'm trying to extract set of data from a string that can match one of three patterns. I have a list of compiled regexes. I want to run through them (in order) and go with the first match.

regexes = [
    compiled_regex_1,
    compiled_regex_2,
    compiled_regex_3,
]

m = None
for reg in regexes:
    m = reg.match(name)
    if m: break

if not m:
    print 'ARGL NOTHING MATCHES THIS!!!'

This should work (haven't tested yet) but it's pretty fugly. Is there a better way of boiling down a loop that breaks when it succeeds or explodes when it doesn't?

There might be something specific to re that I don't know about that allows you to test multiple patterns too.

+5  A: 

You can use the else clause of the for loop:

for reg in regexes:
    m = reg.match(name)
    if m: break
else:
    print 'ARGL NOTHING MATCHES THIS!!!'
Nathon
+1 for correct, but I've gotten the impression that the for-else construct is Considered Confusing, and despite the many cases where it is exactly what you want it seems frowned upon (but I'd love to be refuted).
msw
Didn't know that one. Although my eyes always associate `else` with `try`; it catches me out with `try`...`except` statements.
Beau Martínez
I've found out about `for..else` at least three times now... And I keep forgetting it. It just not good naming but it does work perfectly. Thanks.
Oli
Hmm. In my real code, I throw a `continue` to a higher-up `for` loop after printing my "ARGL..." message. Is that going to get screwed up in the scope of a `for..else`?
Oli
If python wants clear code, they won't get it with `for..else`.
mathepic
@Oli, I don't think so. The else only happens when the for loop whose scope it matches finishes normally and it sounds like that will still happen (but not for the outer loop). @msw, I don't know about "Considered Confusing", but I find it useful and it saves some `continue` and `break` statements while making it obvious what it does. If it weren't part of Guido's great plan, he would have left it out of py3k.
Nathon
+1  A: 

Since you have a finite set in this case, you could use short ciruit evaluation:

m = compiled_regex_1.match(name) or
    compiled_regex_2.match(name) or
    compiled_regex_3.match(name) or
    print("ARGHHHH!")
Eric
+2  A: 

If you just want to know if any of the regex match then you could use the builtin any function:

if any(reg.match(name) for reg in regexes):
     ....

however this will not tell you which regex matched.

Alternatively you can combine multiple patterns into a single regex with |:

regex = re.compile(r"(regex1)|(regex2)|...")

Again this will not tell you which regex matched, but you will have a match object that you can use for further information. For example you can find out which of the regex succeeded from the group that is not None:

>>> match = re.match("(a)|(b)|(c)|(d)", "c")
>>> match.groups()
(None, None, 'c', None)

However this can get complicated however if any of the sub-regex have groups in them as well, since the numbering will be changed.

This is probably faster than matching each regex individually since the regex engine has more scope for optimising the regex.

Dave Kirby
A: 

I use something like Dave Kirby suggested, but add named groups to the regexps, so that I know which one matched.

regexps = {
  'first': r'...',
  'second': r'...',
}

compiled = re.compile('|'.join('(?P<%s>%s)' % item for item in regexps.iteritems()))
match = compiled.match(my_string)
print match.lastgroup
Radomir Dopieralski
Beware that the order the regex are tried in will be undefined, which may produce unexpected results.
Dave Kirby
Right, because I used a dict. If you use a list of tuples instead, or sort regexps.items in that list comprehension, then it's well-defined.
Radomir Dopieralski
A: 

Eric is in better track in taking bigger picture of what OP is aiming, I would use if else though. I would also think that using print function in or expression is little questionable. +1 for Nathon of correcting OP to use proper else statement.

Then my alternative:

# alternative to any builtin that returns useful result,
# the first considered True value
def first(seq):
    for item in seq:
        if item: return item

regexes = [
    compiled_regex_1,
    compiled_regex_2,
    compiled_regex_3,
]

m = first(reg.match(name) for reg in regexes)
print(m if m else 'ARGL NOTHING MATCHES THIS!!!')
Tony Veijalainen
+1  A: 

In Python 2.6 or better:

import itertools as it

m = next(it.ifilter(None, (r.match(name) for r in regexes)), None)

The ifilter call could be made into a genexp, but only a bit awkwardly, i.e., with the usual trick for name binding in a genexp (aka the "phantom nested for clause idiom"):

m = next((m for r in regexes for m in (r.match(name),) if m), None)

but itertools is generally preferable where applicable.

The bit needing 2.6 is the next built-in, which lets you specify a default value if the iterator is exhausted. If you have to simulate it in 2.5 or earlier,

def next(itr, deft):
  try: return itr.next()
  except StopIteration: return deft
Alex Martelli