views:

128

answers:

4

I'm pretty experienced with Perl and Ruby but new to Python so I'm hoping someone can show me the Pythonic way to accomplish the following task. I want to compare several lines against multiple regular expressions and retrieve the matching group. In Ruby it would be something like this:

# Revised to show variance in regex and related action.
data, foo, bar = [], nil, nil
input_lines.each do |line|
  if line =~ /Foo(\d+)/
    foo = $1.to_i
  elsif line =~ /Bar=(.*)$/
    bar = $1
  elsif bar
    data.push(line.to_f)
  end
end

My attempts in Python are turning out pretty ugly because the matching group is returned from a call to match/search on a regular expression and Python has no assignment in conditionals or switch statements. What's the Pythonic way to do (or think!) about this problem?

A: 

Something like this, but prettier:

regexs = [re.compile('...'), ...]

for regex in regexes:
  m = regex.match(s)
  if m:
    print m.groups()
    break
else:
  print 'No match'
Ignacio Vazquez-Abrams
I tried something similar but I want to take different actions based on which regex matches, so I moved from a list to a dictionary mapping the regexs to lambdas to be called if a match is found but it makes for some confusing code...
maerics
+1  A: 

There are several ways to "bind a name on the fly" in Python, such as my old recipe for "assign and test"; in this case I'd probably choose another such way (assuming Python 2.6, needs some slight changes if you're working with an old version of Python), something like:

import re
pats_marks = (r'^A:(.*)$', 'FOO'), (r'^B:(.*)$', 'BAR')
for line in lines:
    mo, m = next(((mo, m) for p, m in pats_mark for mo in [re.match(p, line)] if mo),
                 (None, None))
    if mo: print '%s: %s' % (m, mo.group(1))
    else: print 'NO MATCH: %s' % line

Many minor details can be adjusted, of course (for example, I just chose (.*) rather than (.*?) as the matching group -- they're equivalent given the immediately-following $ so I chose the shorter form;-) -- you could precompile the REs, factor things out differently than the pats_mark tuple (e.g., with a dict indexed by RE patterns), etc.

But the substantial ideas, I think, are to make the structure data-driven, and to bind the match object to a name on the fly with the subexpression for mo in [re.match(p, line)], a "loop" over a single-item list (genexps bind names only by loop, not by assignment -- some consider using this part of genexps' specs to be "tricky", but I consider it a perfectly acceptable Python idiom, esp. since it was considered back in the time when listcomps, genexps' "ancestors" in a sense, were being designed).

Alex Martelli
A: 

your regex simply takes whatever is after the 3rd character onwards.

for line in open("file"):
    if line.startswith("A:"):
        print "FOO #{"+line[2:]+"}"
    elif line.startswith("B:"):
        print "BAR #{"+line[2:]+"}"
    else:
        print "No match"
ghostdog74
nice way, but I'd use split and comparison:begin, rest = line.split(':', 1)if begin == "A": etc...
moshez
This is good but I'm looking for something more general, the simple regex is just for explanatory purposes, the actual regexs would be fairly complex.
maerics
A: 

Paul McGuire's solution of using an intermediate class REMatcher which performs the match, stores the match group, and returns a boolean for success/fail turned out to produce the most legible code for this purpose.

maerics