tags:

views:

370

answers:

3

What is the best method to implement a system for parsing a configuration file based on a set of rules? I would appreciate any pointers in the direction of best practices or existing implementations.

Edit: I have not decided not choice of any specific language yet but I am comfortable with both Perl and Python. The files are something along Router/Switch configuration files with different functional sections.

+2  A: 

Assuming that this is not an XML based configuration file, may I recommend ANTLR?

  • Generates parser code based on EBNF style grammar rules file that you provide.
  • Has a graphical editor for the rules file as an Eclipse plugin.
  • Very strong and sound parser technology
  • Flexible in terms of what you want to do with the parsed output
  • Runtime environments permit parsing with ANTLR in C++, C, C#, Java, Python, and Ruby applications.
Glenn
For XML, see my answer about ANTXR below -- it's based on ANTLR.
Scott Stanchfield
A: 

I often use YAML for a config files, it's lightweight and there are a ton of libraries supporting it in different languages.

http://www.yaml.org/

Adam Pope
A: 

If you're thinking of XML and using Java, you can try my XML parser generator, ANTXR, which is based off ANTLR 2.7.x

See http://javadude.com/tools/antxr/index.html for details

An example:

XML File:

<?xml version="1.0"?>
<people>
  <person ssn="111-11-1111">
    <firstName>Terence</firstName>
    <lastName>Parr</lastName>
  </person>
  <person ssn="222-22-2222">
    <firstName>Scott</firstName>
    <lastName>Stanchfield</lastName>
  </person>
  <person ssn="333-33-3333">
    <firstName>James</firstName>
    <lastName>Stewart</lastName>
  </person>
</people>

A parser skeleton:

header {
package com.javadude.antlr.sample.xml;
}

class PeopleParser extends Parser;

document
  : <people> EOF;

<people>
  : (<person>)*
  ;

<person> 
  : ( <firstName>
    | <lastName>
    )*
  ;

<firstName>
  : PCDATA
  ;

<lastName>
  : PCDATA
  ;

A parser that actually does something with the data:

header {
package com.javadude.antlr.sample.xml;

import java.util.List;
import java.util.ArrayList;
}

class PeopleParser extends Parser;


document returns [List results = null]
  : results=<people> EOF
  ;

<people> returns [List results = new ArrayList()]
  { Person p; }
  : ( p=<person>  { results.add(p); }   )*
  ;

<person> returns [Person p = new Person()]
  { String first, last; }
  : ( first=<firstName>  { p.setFirstName(first); }
    | last=<lastName>    { p.setLastName(last);   }
    )*
  ;

<firstName> returns [String value = null]
  : pcdata:PCDATA { value = pcdata.getText(); }
  ;

<lastName> returns [String value = null]
  : pcdata:PCDATA { value = pcdata.getText(); }
  ;

I've been using this for years, and when I've shown it to folks at work, after the initial "getting used to a grammar" learning curve, they really love it.

Note that you can use a SAX or XMLPull front-end (and the front-end can do validation if you like). The code to run the parser looks like

// Create our scanner (using a simple SAX parser setup)
BasicCrimsonXMLTokenStream stream =
    new BasicCrimsonXMLTokenStream(new FileReader("people.xml"),
                                   PeopleParser.class, false, false);


// Create our ANTLR parser
PeopleParser peopleParser = new PeopleParser(stream);

// parse the document
peopleParser.document();
Scott Stanchfield