views:

122

answers:

7

Hi, I need to try a string against multiple (exclusive - meaning a string that matches one of them can't match any of the other) regexes, and execute a different piece of code depending on which one it matches. What I have currently is:

m = firstre.match(str)
if m:
    # Do something

m = secondre.match(str)
if m:
    # Do something else

m = thirdre.match(str)
if m:
    # Do something different from both

Apart from the ugliness, this code matches against all regexes even after it has matched one of them (say firstre), which is inefficient. I tried to use:

elif m = secondre.match(str)

but learnt that assignment is not allowed in if statements.

Is there an elegant way to achieve what I want?

+1  A: 

A few ideas, none of them good necessarily, but it might fit your code well:

How about putting the code in a separate function, i.e. MatchRegex(), which returns which regex it matched. That way, inside the function, you can use a return after you matched the first (or second) regex, meaning you lose the inefficiency.

Of course, you could always go with just nested if statements:

m = firstre.match(str)
if m:
   # Do something
else:
    m = secondre.match(str)
    ...

I really don't see any reason not to go with nested ifs. They're very easy to understand and as efficient as you want. I'd go for them just for their simplicity.

Edan Maor
+1 for suggesting the simple solution for the problem
João Portela
what if there are few hundred regexes? code would be hardly readable for anything over 10-something.
kibitzer
@kibitzer: In that case, it makes sense to engineer a general solution. Or in the case where it is expected to grow to that. Not every time you have to write 3 nested if's.
Edan Maor
+4  A: 
def doit( s ):

    # with some side-effect on a
    a = [] 

    def f1( s, m ):
        a.append( 1 )
        print 'f1', a, s, m

    def f2( s, m ):
        a.append( 2 )
        print 'f2', a, s, m

    def f3( s, m ):
        a.append( 3 )
        print 'f3', a, s, m

    re1 = re.compile( 'one' )
    re2 = re.compile( 'two' )
    re3 = re.compile( 'three' )


    func_re_list = (
        ( f1, re1 ), 
        ( f2, re2 ), 
        ( f3, re3 ),
    )
    for myfunc, myre in func_re_list:
        m = myre.match( s )
        if m:
            myfunc( s, m )
            break


doit( 'one' ) 
doit( 'two' ) 
doit( 'three' ) 
Bluebird75
+1 for pure pythonic awesomeness. Personally, I would put the list of tuples outside the for statement e.g. `match_functions = ((f1,re1),(f2,re2),..)` and do `for myfunc,myre in match_functions:`
Kimvais
Don't forget to add "break" to save trying to match against the rest of the list.
Ofri Raviv
Edited with the suggestions of the comments plus real example.
Bluebird75
I finally implemented a solution similar to this. In my case, I was able to refactor 3 of the 4 cases into a function. So I matched the first regex alone directly, and if it didn't match, went through the other 3 regexes, and called the function with the appropriate arguments. For calling the function with different arguments depending on the regex, I made a dict of (regex: (arg1, arg2)). The code is (IMHO atleast) more elegant than before now. Thanks a lot.
sundar
+3  A: 

This might be a bit over engineering the solution, but you could combine them as a single regexp with named groups and see which group matched. This could be encapsulated as a helper class:

import re
class MultiRe(object):
    def __init__(self, **regexps):
        self.keys = regexps.keys()
        self.union_re = re.compile("|".join("(?P<%s>%s)" % kv for kv in regexps.items()))

    def match(self, string, *args):
        result = self.union_re.match(string, *args)
        if result:
            for key in self.keys:
                if result.group(key) is not None:
                    return key

Lookup would be like this:

multi_re = MultiRe(foo='fo+', bar='ba+r', baz='ba+z')
match = multi_re.match('baaz')
if match == 'foo':
     # one thing
elif match == 'bar':
     # some other thing
elif match == 'baz':
     # or this
else:
     # no match
Ants Aasma
Nice! (min 15 char)
Bart Kiers
This looks over engineering from my point of view. I don't find the code really easy to understand.
Bluebird75
A: 

Early returns, perhaps?

def doit(s):
    m = re1.match(s)
    if m:
        # Do something
        return

    m = re2.match(s)
    if m:
        # Do something else
        return

    ...

Ants Aasma's answer is good too. If you prefer less scaffolding you can write that out yourself using the verbose regex syntax.

re = re.compile(r'''(?x)    # set the verbose flag
    (?P<foo> fo+ )
  | (?P<bar> ba+r )
  | #...other alternatives...
''')

def doit(s):
    m = re.match(s)
    if m.group('foo'):
        # Do something
    elif m.group('bar'):
        # Do something else
    ...

I've done this a lot. It's fast and it works with re.finditer.

Jason Orendorff
A: 

Do it with an elif in case you just need a True/False out of regex matching:

if regex1.match(str):
    # do stuff
elif regex2.match(str):
    # and so on
kibitzer
I think he needs the return value from regex.match(str)
João Portela
+1  A: 

You could use

def do_first(str, res, actions):
  for re,action in zip(res, actions):
    m = re.match(str)
    if m:
      action(str)
      return

So, for example, say you've defined

def do_something_1(str):
  print "#1: %s" % str

def do_something_2(str):
  print "#2: %s" % str

def do_something_3(str):
  print "#3: %s" % str

firstre  = re.compile("foo")
secondre = re.compile("bar")
thirdre  = re.compile("baz")

Then call it with

do_first("baz",
         [firstre,        secondre,       thirdre],
         [do_something_1, do_something_2, do_something_3])
Greg Bacon
+3  A: 

This is a good application for the undocumented but quite useful re.Scanner class.

Robert Rossney
Nice! Thanks for the link.
Brandon Corfman