ansaurus

Question

How should I organise my functions with pyparsing?

Answer 1

+1 A:

I could go either way on using a single big method to create your parser vs. taking it in steps the way you have it now.

I can see that you have defined some useful helper utilities, such as slit ("suppress Literal", I presume), stringtolits, and decimaltable. This looks good to me.

I like that you are using results names, they really improve the robustness of your post-parsing code. I would recommend using the shortcut form that was added in pyparsing 1.4.7, in which you can replace

busname.setResultsName("bus1")

with

busname("bus1")

This can declutter your code quite a bit.

I would look back through your parse actions to see where you are using numeric indexes to access individual tokens, and go back and assign results names instead. Here is one case, where GetStats returns (ngroup + sgroup).setParseAction(self.process_stats). process_stats has references like:

self.num_load = tokens[0]["loads"]
self.num_generator = tokens[0]["generators"]
self.num_transformer = tokens[0]["transformers"]
self.num_line = tokens[0]["lines"]
self.num_bus = tokens[0]["buses"]
self.power_rate = tokens[1]["rate"]

I like that you have Group'ed the values and the stats, but go ahead and give them names, like "network" and "soln". Then you could write this parse action code as (I've also converted to the - to me - easier-to-read object-attribute notation instead of dict element notation):

self.num_load = tokens.network.loads
self.num_generator = tokens.network.generators
self.num_transformer = tokens.network.transformers
self.num_line = tokens.network.lines
self.num_bus = tokens.network.buses
self.power_rate = tokens.soln.rate

Also, a style question: why do you sometimes use the explicit And constructor, instead of using the '+' operator?

busdef = And([busname.setResultsName("bus1"),
            busname.setResultsName("bus2"),
            integer.setResultsName("linenum"),
            decimaltable("pf qf pl ql".split())])

This is just as easily written:

busdef = (busname("bus1") + busname("bus2") + 
            integer("linenum") + 
            decimaltable("pf qf pl ql".split()))

Overall, I think this is about par for a file of this complexity. I have a similar format (proprietary, unfortunately, so cannot be shared) in which I built the code in pieces similar to the way you have, but in one large method, something like this:

def parser():
    header = Group(...)
    inputsummary = Group(...)
    jobstats = Group(...)
    measurements = Group(...)
    return header("hdr") + inputsummary("inputs") + jobstats("stats") + measurements("meas")

The Group constructs are especially helpful in a large parser like this, to establish a sort of namespace for results names within each section of the parsed data.

Paul McGuire 2009-12-08 14:29:02

Just the sort of answer I was looking for. The name shortcut will make things look nicer. object-attribute notation is new to me; does it work with all dicts? either way, thanks. I used and because when I wrote those parts I forgot about implied line continuation inside parentheses. I am thinking of having PSATReport_IO as a seperate class, there really isn't much other functions that are needed and it's looks too cluttered.

James Brooks 2009-12-08 14:43:40

dd = dict(zip("a b c d".split(),range(4))); dd.a; dd["a"]No it doesn't work. Nice syntax though. I might have to try and implement it in a custom dict class.

James Brooks 2009-12-08 14:47:19

No, object attribute-style referencing is a feature of pyparsing's ParseResults, not found in standard Python dicts. As an additional datapoint, my parser for the proprietary format is about 900 lines long.

Paul McGuire 2009-12-08 16:35:44

ansaurus

tags:

views:

answers:

How should I organise my functions with pyparsing?

related questions