views:

147

answers:

6

I have a message definition file that looks like this

struct1 
{
  field="name" type="string" ignore="false"; 
  field="id" type="int" enums=" 0="val1" 1="val2" ";
}

struct2
{
  field = "object" type="struct1";
  ...
}

How can I parse this into a dictionary with keys 'struct1, struct2' and values should be a list of dictionaries, each corresponding to the respective line number so that i can do the following

dict['struct1'][0]['type'] // Would return 'string'
dict['struct1'][1]['type'] // Would return 'int'
dict['struct1'][1]['enums']['0'] // Would return 'val1'
dict['struct2'][0]['type'] // Would return 'struct1'

and so on..

Also, I can change the format of the definition file and if any of you have suggestions on modifying the definition file format to make it easier to parse, please let me know.

Thanks

+4  A: 

Use can use json as file format, it supports (in python lingo) dictionaries and lists. Since json support is native only for python 2.6 and higher, you'll need this library: http://pypi.python.org/pypi/simplejson/2.0.9

{ "struct1" 
    [
        {"field" : "name", "type" : "string", "ignore" : false },
        {"field" : "id", "type" : "int", "0" : "val1", "1" : "val2" }
        {"field" : "id", "type" : "int", "enums" : { "0": "val1", "1": "val2"}}
    ]
  "struct2"
    [ ... ]
}

python part (sketched, not tested):

>>> import simplejson as json
>>> d = json.loads(yourjsonstring)
>>> d['struct1'][0]['field']
name
>>> d['struct1'][2]['enums']['0']
val1
...
The MYYN
How would I write the nested dictionary part? Like I want enum as a dictionary in the second line
TP
I liked this answer so the upvote but i found that I couldnt install the plugin on the production environment so could not use this method.
TP
+4  A: 

Use YAML instead. There is PyYAML library for python. It is heavily used by Google AppEngine.

This is just a friendly suggestion :-)

Example ( Mapping Scalars to Sequences ):

american:
  - Boston Red Sox
  - Detroit Tigers
  - New York Yankees
national:
  - New York Mets
  - Chicago Cubs
  - Atlanta Braves

There is also JSON of course which has ample support on Python (but tends to hurt my fingers a bit more ;-)

jldupont
+4  A: 

Might I recommend YAML? IMHO the syntax is more readable for data entry, and then you don't have to write and maintain a parser. Eschew XML -- it is good for marking up text, but not good for data entry since the text isn't human readable with all the duplicate tags everywhere.

Ross Rogers
What would be the file format in YAML for the above 2 struct? Could you provide an example?
TP
Deferring to jldupont :-)
Ross Rogers
A: 

Since you are at liberty to change the file format, you could change it to any of several formats that have Python libraries to read and write. For example, JSON, YAML, XML, or even the built-in ConfigParser.

[struct1]
field: name
type: string
ignore: false
# etc.
Jonathan Feinberg
+1  A: 

Pyparsing is a nice easy to use library. That what I would use.

http://pyparsing.wikispaces.com/

James Brooks
+2  A: 

I would simply use Python for the message definition file format.

Let your message definition file be a plain Python file:

# file messages.py
messages = dict(
    struct1=[
        dict(field="name", type="string", ignore=False),
        dict(field="id", type="int", enums={0: "val1", 1: "val2"}),
        ],
    struct2=[
        dict(field="object", type="struct1"),
        ]
    )

Your program can then import and use that data structure directly:

# in your program
from messages import messages
print messages['struct1'][0]["type"]
print messages['struct1'][1]['type']
print messages['struct1'][1]['enums'][0]
print messages['struct2'][0]['type']

Using this approach, you let Python do the parsing for you.

And you also gain a lot of possibilities. For instance, imagine you (for some strange reason) have a message structure with 1000 fields named "field_N". Using a conventional file format you would have to add 1000 lines of field definitions (unless you build some looping into your config file parser - you are then on your way to creating a programming language anyway). Using Python for this purpose, you could do something like:

messages = dict(
    ...
    strange_msg=[dict(field="field_%d" % i) for i in range(1000)],
    ...
    )

BTW, on Python 2.6, using named tuples instead of dict is an option. Or use on of the numerous "Bunch" classes available (see the Python cookbook for a namedtuple for 2.5).

EDIT:

Below is code that reads message definition files as specified on the command line. It uses execfile instead of import.

# file mainprogram.py

def read_messages_from_file(filename):
    module_dict = {}
    execfile(filename, module_dict)
    return module_dict['messages']

if __name__ == "__main__":
    from pprint import pprint
    import sys

    for arg in sys.argv[1:]:
        messages = read_messages_from_file(arg)
        pprint(messages)

Executing:

$ python mainprogram.py messages1 messages2 messages3

will read and print the messages defined in each file.

codeape
If I have say 3 separate files the names of which I do not know when writing the script. I want the name of the file to be passed in as an argument. How would that be done?
TP
I can see the advantage in this but it feels bad, like a massive security risk. Or an opportunity for your code to magically mess up. I'm not sure if that makes sense though.
James Brooks
If you think this is a security risk, the first thing you should ask yourself is: Who is the guy that will add some malicious code to the message definition files. Find that guy and get rid of him. And if he exists, what is stopping him from editing the program itself?
codeape
@Jaelebi: Edited the answer with info on reading files specified at runtime.
codeape