If you're thinking of XML and using Java, you can try my XML parser generator, ANTXR, which is based off ANTLR 2.7.x
See http://javadude.com/tools/antxr/index.html for details
An example:
XML File:
<?xml version="1.0"?>
<people>
<person ssn="111-11-1111">
<firstName>Terence</firstName>
<lastName>Parr</lastName>
</person>
<person ssn="222-22-2222">
<firstName>Scott</firstName>
<lastName>Stanchfield</lastName>
</person>
<person ssn="333-33-3333">
<firstName>James</firstName>
<lastName>Stewart</lastName>
</person>
</people>
A parser skeleton:
header {
package com.javadude.antlr.sample.xml;
}
class PeopleParser extends Parser;
document
: <people> EOF;
<people>
: (<person>)*
;
<person>
: ( <firstName>
| <lastName>
)*
;
<firstName>
: PCDATA
;
<lastName>
: PCDATA
;
A parser that actually does something with the data:
header {
package com.javadude.antlr.sample.xml;
import java.util.List;
import java.util.ArrayList;
}
class PeopleParser extends Parser;
document returns [List results = null]
: results=<people> EOF
;
<people> returns [List results = new ArrayList()]
{ Person p; }
: ( p=<person> { results.add(p); } )*
;
<person> returns [Person p = new Person()]
{ String first, last; }
: ( first=<firstName> { p.setFirstName(first); }
| last=<lastName> { p.setLastName(last); }
)*
;
<firstName> returns [String value = null]
: pcdata:PCDATA { value = pcdata.getText(); }
;
<lastName> returns [String value = null]
: pcdata:PCDATA { value = pcdata.getText(); }
;
I've been using this for years, and when I've shown it to folks at work, after the initial "getting used to a grammar" learning curve, they really love it.
Note that you can use a SAX or XMLPull front-end (and the front-end can do validation if you like). The code to run the parser looks like
// Create our scanner (using a simple SAX parser setup)
BasicCrimsonXMLTokenStream stream =
new BasicCrimsonXMLTokenStream(new FileReader("people.xml"),
PeopleParser.class, false, false);
// Create our ANTLR parser
PeopleParser peopleParser = new PeopleParser(stream);
// parse the document
peopleParser.document();