########################################
# some comment
# other comment
########################################
block1 {
value=data
some_value=some other kind of data
othervalue=032423432
}
block2 {
value=data
some_value=some other kind of data
othervalue=032423432
}
views:
205answers:
4What's the best way(error proof / foolproof) to parse a file using python with following format?
Well, the data looks pretty regular. So you could do something like this (untested):
class Block(object):
def __init__(self, name):
self.name = name
infile = open(...) # insert filename here
current = None
blocks = []
for line in infile:
if line.lstrip().startswith('#'):
continue
elif line.rstrip().endswith('{'):
current = Block(line.split()[0])
elif '=' in line:
attr, value = line.strip().split('=')
try:
value = int(value)
except ValueError:
pass
setattr(current, attr, value)
elif line.rstrip().endswith('}'):
blocks.append(current)
The result will be a list of Block instances, where block.name
will be the name ('block1'
, 'block2'
, etc.) and other attributes correspond to the keys in your data. So, blocks[0].value
will be 'data', etc. Note that this only handles strings and integers as values.
(there is an obvious bug here if your keys can ever include 'name'. You might like to change self.name
to self._name
or something if this can happen)
HTH!
If you do not really mean parsing, but rather text processing and the input data is really that regular, then go with John's solution. If you really need some parsing (like there are some a little more complex rules to the data that you are getting), then depending on the amount of data that you need to parse, I'd go either with pyparsing or simpleparse. I've tried both of them, but actually pyparsing was too slow for me.
The best way would be to use an existing format such as JSON.
Here's an example parser for your format:
from lepl import (AnyBut, Digit, Drop, Eos, Integer, Letter,
NON_GREEDY, Regexp, Space, Separator, Word)
# EBNF
# name = ( letter | "_" ) , { letter | "_" | digit } ;
name = Word(Letter() | '_',
Letter() | '_' | Digit())
# words = word , space+ , word , { space+ , word } ;
# two or more space-separated words (non-greedy to allow comment at the end)
words = Word()[2::NON_GREEDY, ~Space()[1:]] > list
# value = integer | word | words ;
value = (Integer() >> int) | Word() | words
# comment = "#" , { all characters - "\n" } , ( "\n" | EOF ) ;
comment = '#' & AnyBut('\n')[:] & ('\n' | Eos())
with Separator(~Regexp(r'\s*')):
# statement = name , "=" , value ;
statement = name & Drop('=') & value > tuple
# suite = "{" , { comment | statement } , "}" ;
suite = Drop('{') & (~comment | statement)[:] & Drop('}') > dict
# block = name , suite ;
block = name & suite > tuple
# config = { comment | block } ;
config = (~comment | block)[:] & Eos() > dict
from pprint import pprint
pprint(config.parse(open('input.cfg').read()))
Output:
[{'block1': {'othervalue': 32423432,
'some_value': ['some', 'other', 'kind', 'of', 'data'],
'value': 'data'},
'block2': {'othervalue': 32423432,
'some_value': ['some', 'other', 'kind', 'of', 'data'],
'value': 'data'}}]