ansaurus

Question

Answer 1

+4 A:

Use can use json as file format, it supports (in python lingo) dictionaries and lists. Since json support is native only for python 2.6 and higher, you'll need this library: http://pypi.python.org/pypi/simplejson/2.0.9

{ "struct1" 
    [
        {"field" : "name", "type" : "string", "ignore" : false },
        {"field" : "id", "type" : "int", "0" : "val1", "1" : "val2" }
        {"field" : "id", "type" : "int", "enums" : { "0": "val1", "1": "val2"}}
    ]
  "struct2"
    [ ... ]
}

python part (sketched, not tested):

>>> import simplejson as json
>>> d = json.loads(yourjsonstring)
>>> d['struct1'][0]['field']
name
>>> d['struct1'][2]['enums']['0']
val1
...

The MYYN 2009-12-17 15:20:47

How would I write the nested dictionary part? Like I want enum as a dictionary in the second line

TP 2009-12-17 15:32:54

I liked this answer so the upvote but i found that I couldnt install the plugin on the production environment so could not use this method.

TP 2009-12-17 20:32:47

Answer 2

+4 A:

Use YAML instead. There is PyYAML library for python. It is heavily used by Google AppEngine.

This is just a friendly suggestion :-)

Example ( Mapping Scalars to Sequences ):

american:
  - Boston Red Sox
  - Detroit Tigers
  - New York Yankees
national:
  - New York Mets
  - Chicago Cubs
  - Atlanta Braves

There is also JSON of course which has ample support on Python (but tends to hurt my fingers a bit more ;-)

jldupont 2009-12-17 15:20:53

Answer 3

+4 A:

Might I recommend YAML? IMHO the syntax is more readable for data entry, and then you don't have to write and maintain a parser. Eschew XML -- it is good for marking up text, but not good for data entry since the text isn't human readable with all the duplicate tags everywhere.

Ross Rogers 2009-12-17 15:21:13

What would be the file format in YAML for the above 2 struct? Could you provide an example?

TP 2009-12-17 15:33:42

Deferring to jldupont :-)

Ross Rogers 2009-12-17 15:35:13

Answer 4

A:

Since you are at liberty to change the file format, you could change it to any of several formats that have Python libraries to read and write. For example, JSON, YAML, XML, or even the built-in ConfigParser.

[struct1]
field: name
type: string
ignore: false
# etc.

Jonathan Feinberg 2009-12-17 15:21:55

Answer 5

+1 A:

Pyparsing is a nice easy to use library. That what I would use.

http://pyparsing.wikispaces.com/

James Brooks 2009-12-17 15:27:19

Answer 6

+2 A:

I would simply use Python for the message definition file format.

Let your message definition file be a plain Python file:

# file messages.py
messages = dict(
    struct1=[
        dict(field="name", type="string", ignore=False),
        dict(field="id", type="int", enums={0: "val1", 1: "val2"}),
        ],
    struct2=[
        dict(field="object", type="struct1"),
        ]
    )

Your program can then import and use that data structure directly:

# in your program
from messages import messages
print messages['struct1'][0]["type"]
print messages['struct1'][1]['type']
print messages['struct1'][1]['enums'][0]
print messages['struct2'][0]['type']

Using this approach, you let Python do the parsing for you.

And you also gain a lot of possibilities. For instance, imagine you (for some strange reason) have a message structure with 1000 fields named "field_N". Using a conventional file format you would have to add 1000 lines of field definitions (unless you build some looping into your config file parser - you are then on your way to creating a programming language anyway). Using Python for this purpose, you could do something like:

messages = dict(
    ...
    strange_msg=[dict(field="field_%d" % i) for i in range(1000)],
    ...
    )

BTW, on Python 2.6, using named tuples instead of dict is an option. Or use on of the numerous "Bunch" classes available (see the Python cookbook for a namedtuple for 2.5).

EDIT:

Below is code that reads message definition files as specified on the command line. It uses execfile instead of import.

# file mainprogram.py

def read_messages_from_file(filename):
    module_dict = {}
    execfile(filename, module_dict)
    return module_dict['messages']

if __name__ == "__main__":
    from pprint import pprint
    import sys

    for arg in sys.argv[1:]:
        messages = read_messages_from_file(arg)
        pprint(messages)

Executing:

$ python mainprogram.py messages1 messages2 messages3

will read and print the messages defined in each file.

codeape 2009-12-17 15:27:42

If I have say 3 separate files the names of which I do not know when writing the script. I want the name of the file to be passed in as an argument. How would that be done?

TP 2009-12-17 15:40:53

I can see the advantage in this but it feels bad, like a massive security risk. Or an opportunity for your code to magically mess up. I'm not sure if that makes sense though.

James Brooks 2009-12-17 15:41:15

If you think this is a security risk, the first thing you should ask yourself is: Who is the guy that will add some malicious code to the message definition files. Find that guy and get rid of him. And if he exists, what is stopping him from editing the program itself?

codeape 2009-12-17 15:57:10

@Jaelebi: Edited the answer with info on reading files specified at runtime.

codeape 2009-12-17 15:57:45

ansaurus

tags:

views:

answers:

Parsing a datafile in python (2.5.2)

related questions