ansaurus

Question

What's the best way(error proof / foolproof) to parse a file using python with following format?

Answer 1

+4 A:

Well, the data looks pretty regular. So you could do something like this (untested):

class Block(object):
    def __init__(self, name):
        self.name = name

infile = open(...)  # insert filename here
current = None
blocks = []

for line in infile:
    if line.lstrip().startswith('#'):
        continue
    elif line.rstrip().endswith('{'):
        current = Block(line.split()[0])
    elif '=' in line:
        attr, value = line.strip().split('=')
        try:
            value = int(value)
        except ValueError:
            pass
        setattr(current, attr, value)
    elif line.rstrip().endswith('}'):
        blocks.append(current)

The result will be a list of Block instances, where block.name will be the name ('block1', 'block2', etc.) and other attributes correspond to the keys in your data. So, blocks[0].value will be 'data', etc. Note that this only handles strings and integers as values.

(there is an obvious bug here if your keys can ever include 'name'. You might like to change self.name to self._name or something if this can happen)

HTH!

John Fouhy 2009-01-29 21:34:23

If the input file is *invalid* and starts with `value=data` your program will crash because `current` has not been initialized. As the old saying goes, *"Garbage In, Garbage out"*. Anyway +1 from me.

Cristian Ciupitu 2009-10-31 16:12:42

Answer 2

+2 A:

You might look into something like pyparsing.

Fred Larson 2009-01-29 22:15:12

Answer 3

+3 A:

If you do not really mean parsing, but rather text processing and the input data is really that regular, then go with John's solution. If you really need some parsing (like there are some a little more complex rules to the data that you are getting), then depending on the amount of data that you need to parse, I'd go either with pyparsing or simpleparse. I've tried both of them, but actually pyparsing was too slow for me.

Bartosz Radaczyński 2009-01-29 22:24:17

Answer 4

+5 A:

The best way would be to use an existing format such as JSON.

Here's an example parser for your format:

from lepl import (AnyBut, Digit, Drop, Eos, Integer, Letter,
                  NON_GREEDY, Regexp, Space, Separator, Word)

# EBNF
# name = ( letter | "_" ) , { letter | "_" | digit } ;
name = Word(Letter() | '_',
            Letter() | '_' | Digit())
# words = word , space+ , word , { space+ , word } ;
# two or more space-separated words (non-greedy to allow comment at the end)
words = Word()[2::NON_GREEDY, ~Space()[1:]] > list
# value = integer | word | words  ;
value = (Integer() >> int) | Word() | words
# comment = "#" , { all characters - "\n" } , ( "\n" | EOF ) ;
comment = '#' & AnyBut('\n')[:] & ('\n' | Eos())

with Separator(~Regexp(r'\s*')):
    # statement = name , "=" , value ;
    statement = name & Drop('=') & value > tuple
    # suite     = "{" , { comment | statement } , "}" ;
    suite     = Drop('{') & (~comment | statement)[:] & Drop('}') > dict
    # block     = name , suite ;
    block     = name & suite > tuple
    # config    = { comment | block } ;
    config    = (~comment | block)[:] & Eos() > dict

from pprint import pprint

pprint(config.parse(open('input.cfg').read()))

Output:

[{'block1': {'othervalue': 32423432,
             'some_value': ['some', 'other', 'kind', 'of', 'data'],
             'value': 'data'},
  'block2': {'othervalue': 32423432,
             'some_value': ['some', 'other', 'kind', 'of', 'data'],
             'value': 'data'}}]

J.F. Sebastian 2009-10-31 16:05:47

ansaurus

tags:

views:

answers:

What's the best way(error proof / foolproof) to parse a file using python with following format?

related questions