tags:

views:

134

answers:

3

I have a python script which process a file line by line, if the line matches a regex, it calls a function to handle it.

My question is is there a better write to refactor my script. The script works, but as it is, i need to keep indent to the right of the editor as I add more and more regex for my file.

Thank you for any idea. Now my code end up like this:

for line in fi.readlines():

       result= reg1.match(line)

       if result:
               handleReg1(result)

       else:
               result = reg2.match(line)

               if result:
                       handleReg2(result)
               else:
                       result = reg3.match(line)

                       if result:
                               handleReg3(result)
                       else:
                               result = reg4.match(line)

                               if result:
                                       handleReg4(result)
                               else:
                                       result = reg5.match(line)

                                       if result:
                                              handleReg5(result)
+12  A: 

I'd switch to using a data structure mapping regexes to functions. Something like:

map = { reg1: handleReg1, reg2: handleReg2, etc }

Then you just loop through them:

for reg, handler in map.items():
    result = reg.match(line)
    if result:
       handler(result)
       break

If you need the matches to happen in a particular order you'll need to use a list instead of a dictionary, but the principal is the same.

samtregar
This is the approach I'd recommend as well. As sam mentioned, you'll want to use a list if you need a certain order - probably something like `handlers = [(reg1, handleReg1), (reg2, handleReg2), ...]` and then your for loop would look like this: `for (reg,handler) in handlers:`
Amber
You definitely want a list and not a dictionary. There is no key lookup here -- it's just an association between a regex and a function. A list of tuples would be good.
hughdbrown
+1  A: 

Here's a trivial one:

handlers = { reg1 : handleReg1, ... }

for line in fi.readlines():
    for h in handlers:
        x = h.match(line)
        if x:
            handlers[h](x)

If there could be a line that matches several regexps this code will be different from the code you pasted: it will call several handlers. Adding break won't help, because the regexps will be tried in a different order, so you'll end up calling the wrong one. So if this is the case you should iterate over list:

handlers = [ (reg1, handleReg1), (reg2, handleReg2), ... ]

for line in fi.readlines():
    for reg, handler in handlers:
        x = reg.match(line)
        if x:
            handler(x)
            break
ilya n.
That's not identical; your way might call several handlers; his calls at most one.
balpha
The version above is not identical either as it may try regexps in order different from reg1, reg2, ...
ilya n.
A: 

An alternate approach that might work for you is to combine all the regexps into one giant regexp and use m.group() to detect which matched. My intuition says this should be faster, but I haven't tested it.

>>> reg = re.compile('(cat)|(dog)|(apple)')
>>> m = reg.search('we like dogs')
>>> print m.group()
dog
>>> print m.groups()
(None, 'dog', None)

This gets complicated if the regexps you're testing against are themselves complicated or use match groups.

Nelson
It's very fast, and the remedy against inner regexps that are complex and use match groups is to used named-groups around them (in a "namespace", e.g. w/a prefix, dijoint from any named groups that may be used by inner regexps).
Alex Martelli