tags:

views:

37

answers:

1

Hi everyone!

I'm writing a simple Python parser, where I loop over each line in a file, and prosess it further if the right conditions are met. My short start:

    def identify(hh_line):
        if(re.match(regex.new_round, hh_line)):
            m = re.match(regex.new_round, hh_line)
            # insert into psql

        ...

        if(re.match...

..and I was wondering what's the best way (practice) to approach this task, since this is the first time I write Python.

Thanks! =)

+2  A: 

First of all, it's redundant to run the match twice - instead, run it, store the result, and branch off of that:

m = re.match(regex.new_round, hh_line)
if m:
    # ...

Next, if you have a bunch of regex -> processing combinations, you might instead make a dict of regex -> function mappings, and then just iterate over it:

def process_a(data):
    # ...

def process_b(data):
    # ...

regex_to_process = {
    'regex_a': process_a,
    'regex_b': process_b,
}

for hh_line in <file object>:
    for regex,process in regex_to_process.iteritems():
        m = re.match(regex, hh_line)
        if m:
            process(hh_line)
Amber
Yes, I reckoned it was. =) Thanks!
laka
Thanks, that looks great - but just one follow-up: why can't I access m.group('title') in ex. in that loop? When I have defined lables in the regex.. but I can see them all using groupdict().
laka
You're using `(?P<name>expression)` syntax, correct? Not sure - could you show more code?
Amber
That's correct. There is really nothing more to show, but the grouping is freaky. The first regex contains like 6-7 groups, all with lables. The second regex contains 3 groups, and when I try to print any higher that 3, it fails. Why?
laka
Well, do keep in mind that the loop contents are running for every regex - so if you try to look at a group that exists in one regex but not in another, it'll fail on the iteration that is for the second regex.
Amber