tags:

views:

326

answers:

4

Is there a way in Python to access match groups without explicitely creating a match object (or another way to beautify the example below)?

Here is an example to clarify my motivation for the question:

Following perl code

if    ($statement =~ /I love (\w+)/) {
  print "He loves $1\n";
}
elsif ($statement =~ /Ich liebe (\w+)/) {
  print "Er liebt $1\n";
}
elsif ($statement =~ /Je t\'aime (\w+)/) {
  print "Il aime $1\n";
}

translated into Python

m = re.search("I love (\w+)", statement)
if m:
  print "He loves",m.group(1)
else:
  m = re.search("Ich liebe (\w+)", statement)
  if m:
    print "Er liebt",m.group(1)
  else:
    m = re.search("Je t'aime (\w+)", statement)
    if m:
      print "Il aime",m.group(1)

looks very awkward (if-else-cascade, match object creation).

+5  A: 

Less efficient, but simpler-looking:

m0 = re.match("I love (\w+)", statement)
m1 = re.match("Ich liebe (\w+)", statement)
m2 = re.match("Je t'aime (\w+)", statement)
if m0:
  print "He loves",m0.group(1)
elif m1:
  print "Er liebt",m1.group(1)
elif m2:
  print "Il aime",m2.group(1)

The problem with the Perl stuff is the implicit updating of some hidden variable. That's simply hard to achieve in Python because you need to have an assignment statement to actually update any variables.

The version with less repetition (and better efficiency) is this:

pats = [
    ("I love (\w+)", "He Loves {0}" ),
    ("Ich liebe (\w+)", "Er Liebe {0}" ),
    ("Je t'aime (\w+)", "Il aime {0}")
 ]
for p1, p3 in pats:
    m= re.match( p1, statement )
    if m:
        print p3.format( m.group(1) )
        break

A minor variation that some Perl folk prefer:

pats = {
    "I love (\w+)" : "He Loves {0}",
    "Ich liebe (\w+)" : "Er Liebe {0}",
    "Je t'aime (\w+)" : "Il aime {0}",
}
for p1 in pats:
    m= re.match( p1, statement )
    if m:
        print pats[p1].format( m.group(1) )
        break

This is hardly worth mentioning except it does come up sometimes from Perl programmers.

S.Lott
@ S.Lott: ok, your solution avoids the if-else-cascade, but at the expenses of doing unneccessary matches (m1 and m2 is not needed if m0 matches); thats why I am not really satisfied with this solution.
Curd
+1 I like the second version better...
Curd
+1 for your second way
gnibbler
+1  A: 

this is not a regex solution.

alist={"I love ":""He loves"","Je t'aime ":"Il aime","Ich liebe ":"Er liebt"}
for k in alist.keys():
    if k in statement:
       print alist[k],statement.split(k)[1:]
ghostdog74
A: 

You could create a helper function:

def re_match_group(pattern, str, out_groups):
    del out_groups[:]
    result = re.match(pattern, str)
    if result:
        out_groups[:len(result.groups())] = result.groups()
    return result

And then use it like this:

groups = []
if re_match_group("I love (\w+)", statement, groups):
    print "He loves", groups[0]
elif re_match_group("Ich liebe (\w+)", statement, groups):
    print "Er liebt", groups[0]
elif re_match_group("Je t'aime (\w+)", statement, groups):
    print "Il aime", groups[0]

It's a little clunky, but it gets the job done.

Adam Rosenfield
+2  A: 

You could create a little class that returns the boolean result of calling match, and retains the matched groups for subsequent retrieval:

import re

class REMatcher(object):
    def __init__(self, matchstring):
        self.matchstring = matchstring

    def match(self,regexp):
        self.rematch = re.match(regexp, self.matchstring)
        return bool(self.rematch)

    def group(self,i):
        return self.rematch.group(i)


for statement in ("I love Mary", 
                  "Ich liebe Margot", 
                  "Je t'aime Marie", 
                  "Te amo Maria"):

    m = REMatcher(statement)

    if m.match(r"I love (\w+)"): 
        print "He loves",m.group(1) 

    elif m.match(r"Ich liebe (\w+)"):
        print "Er liebt",m.group(1) 

    elif m.match(r"Je t'aime (\w+)"):
        print "Il aime",m.group(1) 

    else: 
        print "???"
Paul McGuire
+1 nice solution; though a little bit verbose
Curd
It might be verbose, but you'll put the REMatcher class in a nice module which you'll import whenever needed. You wouldn't ask this question for an issue that won't come up again in the future, would you?
ΤΖΩΤΖΙΟΥ
@ΤΖΩΤΖΙΟΥ: I agree; but, why isn't such a class in module re yet?
Curd
@Curd: because you're the one to bring it up. Thousands of other submitters to the Python code base have lived fine without it, so *why* should there be such a class in the re module? In any case, if you think such functionality belongs to the re module, you're most than welcome to supply a patch. Otherwise, please refrain from asking "why aren't things like I think they should be?" questions, because they are non-productive.
ΤΖΩΤΖΙΟΥ
@ΤΖΩΤΖΙΟΥ: I disagree. Beeing satisied by the fact that "thousands of others" didn't consider to introduce it is just silly.How can I be sure that there is no good reason not to have such a class if I don't ask "Why"? I don't see one, but maybe somebody else does and can explain it (and thus give a better insight into the philosophy of Python).Here is an good example that such questions are productive:http://stackoverflow.com/questions/837265/why-is-there-no-operator-in-c-c
Curd
“Why” questions are generally productive, but your question falls in the subcategory “Why not how *I* like” (emphasis on “how I like”), which cannot be answered. You consider that such a function/class would be most useful, and then ask why others haven't acted upon it. For a change to occur, the motivated (here: you) has to justify the change to the rest of the community (here: the Python community). It's quite self-centered and non-productive to ask the community why your desired change hasn't already been introduced.
ΤΖΩΤΖΙΟΥ
Anyway, I already answered your question, but I can rephrase if you need me to: the feature you ask for is easiest to implement, just like the itertools recipes in the documentation, and those are far more generic (and therefore stdlib-worthy) than your desired change. BTW, you might have noticed that *by design* assigments in Python are not expressions, which would have solved your issue.
ΤΖΩΤΖΙΟΥ