ansaurus

Question

Python comparing string against several regular expressions

Answer 1

A:

Something like this, but prettier:

regexs = [re.compile('...'), ...]

for regex in regexes:
  m = regex.match(s)
  if m:
    print m.groups()
    break
else:
  print 'No match'

Ignacio Vazquez-Abrams 2010-04-13 22:55:31

I tried something similar but I want to take different actions based on which regex matches, so I moved from a list to a dictionary mapping the regexs to lambdas to be called if a match is found but it makes for some confusing code...

maerics 2010-04-13 23:00:47

Answer 2

+1 A:

There are several ways to "bind a name on the fly" in Python, such as my old recipe for "assign and test"; in this case I'd probably choose another such way (assuming Python 2.6, needs some slight changes if you're working with an old version of Python), something like:

import re
pats_marks = (r'^A:(.*)$', 'FOO'), (r'^B:(.*)$', 'BAR')
for line in lines:
    mo, m = next(((mo, m) for p, m in pats_mark for mo in [re.match(p, line)] if mo),
                 (None, None))
    if mo: print '%s: %s' % (m, mo.group(1))
    else: print 'NO MATCH: %s' % line

Many minor details can be adjusted, of course (for example, I just chose (.*) rather than (.*?) as the matching group -- they're equivalent given the immediately-following $ so I chose the shorter form;-) -- you could precompile the REs, factor things out differently than the pats_mark tuple (e.g., with a dict indexed by RE patterns), etc.

But the substantial ideas, I think, are to make the structure data-driven, and to bind the match object to a name on the fly with the subexpression for mo in [re.match(p, line)], a "loop" over a single-item list (genexps bind names only by loop, not by assignment -- some consider using this part of genexps' specs to be "tricky", but I consider it a perfectly acceptable Python idiom, esp. since it was considered back in the time when listcomps, genexps' "ancestors" in a sense, were being designed).

Alex Martelli 2010-04-13 23:12:14

Answer 3

A:

your regex simply takes whatever is after the 3rd character onwards.

for line in open("file"):
    if line.startswith("A:"):
        print "FOO #{"+line[2:]+"}"
    elif line.startswith("B:"):
        print "BAR #{"+line[2:]+"}"
    else:
        print "No match"

ghostdog74 2010-04-13 23:31:59

nice way, but I'd use split and comparison:begin, rest = line.split(':', 1)if begin == "A": etc...

moshez 2010-04-13 23:33:29

This is good but I'm looking for something more general, the simple regex is just for explanatory purposes, the actual regexs would be fairly complex.

maerics 2010-04-13 23:52:17

Answer 4

A:

Paul McGuire's solution of using an intermediate class REMatcher which performs the match, stores the match group, and returns a boolean for success/fail turned out to produce the most legible code for this purpose.

maerics 2010-04-26 22:24:20

ansaurus

tags:

views:

answers:

Python comparing string against several regular expressions

related questions