tags:

views:

57

answers:

2

Hi,

I am new to Antlr, but have used Flex/Bison before. I want to know if what I want to do using Antlr is possible.

I want to parse an PDDL file using Antlr and build up my own representation of the PDDL file's contents in a Java Class that I wrote as the PDDL file is parsed (in the actions for the rules?). After the file is finished parsing I want to return the object representation of the file's contents to the Java program to run other operations on.

So essentially, I want to invoke an Antler produced PDDL parser on a PDDL file from inside a Java program and have it return an object that describes the PDDL file to the main Java program.

Is this possible? I have tried looking at the documentation, but haven't found a good answer.

Thanks very much.

A: 

This is certainly possible, since Antlr is designed to generate parsers that then get invoked as part of a larger system (eg, a compiler or a static code analyzer).

Start with Terence Parr's The Definitive Antlr Reference: Building Domain-Specific Languages. He's the Antlr author, and also an unusually clear and jargon-free teacher on language processing.

Martin Fowler's Domain-Specific Languages uses Antlr in a lot of its examples. For instance on page 200 he shows a simple "Hello World" example where a Java program calls Antlr to parse a file of people to greet, and while doing it emits the greetings. Here's where the work gets done (page 206):

class GreetingsLoader. ..
  public void run() {
    try {
      GreetingsLexer lexer = new GreetingsLexer(new ANTLRReaderStream(input) ) ;
      GreetingsParser parser = new GreetingsParser(new CommonTokenStream(lexer) ) ;
      parser.helper = this;
      parser.script() ;
      if (hasErrors() ) throw new RuntimeException("it all went pear-shaped\n" +
 errorReport() ) ;
    } catch (IOException e) {
      throw new RuntimeException( e) ;
    } catch (RecognitionException e) {
      throw new RuntimeException( e) ;
    }
  }

A third good book is Terence's new one on DSLs Language Implementation Patterns. He describes various ways to use Antlr, as for instance to write an abstract syntax tree generator to put into a compiler.

Jim Ferrans
+1  A: 

So essentially, I want to invoke an Antler produced PDDL parser on a PDDL file from inside a Java program and have it return an object that describes the PDDL file to the main Java program.

Is this possible?

Sure.

First you need to describe your language in a (ANTLR) grammar file. The easiest is to do this in a combined grammar. A combined grammar will create a lexer and parser for your language. When the language gets more complex, it is better to separate these two, but to start out, it will be easier to use just one (combined) grammar file.

Let's say the PDDL language is just an easy language: it is a succession of one or more numbers either in hexadecimal (0x12FD), octal (0745) or decimal (12345) notation separated by white spaces. This language can be described in the following ANTLR grammar file called PDDL.g:

grammar PDDL;

parse
  :  number+ EOF
  ;

number
  :  Hex
  |  Dec
  |  Oct
  ;

Hex
  :  '0' ('x' | 'X') ('0'..'9' | 'a'..'f' | 'A'..'F')+
  ;

Dec
  :  '0'
  |  '1'..'9' ('0'..'9')*
  ;

Oct
  :  '0' '0'..'7'+
  ;

Space
  :  (' ' | '\t' | '\r' | '\n'){$channel=HIDDEN;}
  ;

In this grammar, the rules (parse, number, Hex, ... are rules) that start with a capital are lexer-rules. The other ones are parser-rules.

From this grammar, you can create a lexer and parser like this:

java -cp antlr-3.2.jar org.antlr.Tool PDDL.g

which produces (at least) the files PDDLParser.java and PDDLLexer.java.

Now create a little test class in which you can use these lexer and parser classes:

import org.antlr.runtime.*;
import java.io.*;
import java.util.*;

public class Main {
    public static void main(String[] args) throws Exception {
        File source = new File("source.txt");
        ANTLRInputStream in = new ANTLRInputStream(new FileInputStream(source));
        PDDLLexer lexer = new PDDLLexer(in);
        CommonTokenStream tokens = new CommonTokenStream(lexer);
        PDDLParser parser = new PDDLParser(tokens);
        parser.parse();
    }
}

where the contents of the source.txt file might look like this:

0xcAfE 0234
66678 0X12 0777

Now compile all .java files:

javac -cp antlr-3.2.jar *.java

and run the main class:

// Windows
java -cp .;antlr-3.2.jar Main

// *nix/MacOS
java -cp .:antlr-3.2.jar Main

If all goes well, nothing is being printed to the console.

Now you say you wanted to let the parser return certain objects based on the contents of your source file. Let's say we want our grammar to return a List<Integer>. This can be done by embedding "actions" in your grammar rules like this:

grammar PDDL;

parse returns [List<Integer> list]
@init{$list = new ArrayList<Integer>();}
  :  (number {$list.add($number.value);})+ EOF
  ;

number returns [Integer value]
  :  Hex {$value = Integer.parseInt($Hex.text.substring(2), 16);}
  |  Dec {$value = Integer.parseInt($Dec.text);}
  |  Oct {$value = Integer.parseInt($Oct.text, 8);}
  ;

Hex
  :  '0' ('x' | 'X') ('0'..'9' | 'a'..'f' | 'A'..'F')+
  ;

Dec
  :  '0'
  |  '1'..'9' ('0'..'9')*
  ;

Oct
  :  '0' '0'..'7'+
  ;

Space
  :  (' ' | '\t' | '\r' | '\n'){$channel=HIDDEN;}
  ;

As you can see, you can let rules return objects (returns [Type t]) and can embed plain Java code if wrapping it in { and }. The @init part in the parse rule is placed at the start of the parse method in the PDDLParser.java file.

Test the new parser with this class:

import org.antlr.runtime.*;
import java.io.*;
import java.util.*;

public class Main {
    public static void main(String[] args) throws Exception {
        File source = new File("source.txt");
        ANTLRInputStream in = new ANTLRInputStream(new FileInputStream(source));
        PDDLLexer lexer = new PDDLLexer(in);
        CommonTokenStream tokens = new CommonTokenStream(lexer);
        PDDLParser parser = new PDDLParser(tokens);
        List<Integer> numbers = parser.parse();
        System.out.println("After parsing :: "+numbers);
    }
}

and you'll see the following being printed to the console:

After parsing :: [51966, 156, 66678, 18, 511]
Bart Kiers