ansaurus

Question

ANTLR (or alternative): decoupling parsing from evaluation

Answer 1

+3 A:

What about using ANTLR AST (Abstract Syntax Tree) and constructing a mirrored tree with your classes by visiting each tree node.

@Giuseppe Cardone added some great links that I post here:

http://www.antlr.org/article/1100569809276/use.tree.grammars.tml

http://www.antlr.org/article/1170602723163/treewalkers.html

An example can be found at:

http://sagarsunkle.spaces.live.com/blog/cns!E07F3B561597E4EE!664.entry?sa=97619042

smink 2010-10-11 21:35:54

Hmm. I'm new to ANTLR so I'm not familiar with how to go about doing what you suggest, or what advantages/disadvantages it might have.

Jason S 2010-10-11 21:37:49

I wanted to elaborate further the question, but basically I agree with @smink : building an AST (maybe using the `output=AST` ANTLR option) and then checking/evaluating/compiling it using a tree walker is the easiest way to decouple grammar from code.

Giuseppe Cardone 2010-10-11 22:16:19

+1 @Giuseppe Cardone for the links. I used this technique in a previous project and it works fine.

smink 2010-10-11 22:23:52

Thanks for the elaboration. I don't mean to critical in an unconstructive way, but I don't think either the "use.tree.grammars.tml" or "treewalkers.html" links are helpful. They seem to be an argumentative debate about whether ANTLR itself takes the right approach: the first says "visitors are not a good solution" and "I like the grammar + actions strategy of ANTLR" (which is exactly the problem I'm having); the second seems to be a rebuttal and suggests that there should be an alternative to the way that ANTLR does things, but doesn't suggest one.

Jason S 2010-10-11 22:53:48

The example link (at sagarsunkle.spaces.live.com) is useful... I sort of get it... but I sort of don't. :/

Jason S 2010-10-11 22:55:57

Answer 2

+2 A:

The examples you mention embed parser actions right inside the grammar for the sake of conciseness. This works fine for small projects. For bigger ones, you'd prefer to make an AST first and then do whatever you want with it. You can do this, hehe, by embedding actions that create the tree, but antlr provides a nicer, declarative way:

http://www.antlr.org/wiki/display/ANTLR3/Tree+construction

You can then use a Tree Grammar to generate code, e.g. with StringTemplate. I've used this toolchain for my thesis and it worked like a charm. But I bet I'd have suffered a lot without having the Anlr3 Reference Book ( http://pragprog.com/titles/tpantlr/the-definitive-antlr-reference )

I also found the lecture notes linked on the antlr page really useful: http://www.antlr.org/wiki/display/CS652/CS652+Home

Also, make use of AntlrWorks to test your grammar. There's also a grammar unit testing suite available. Plus, the antlr mailing list is really active, and Terence Parr actively responds to most posts. Plus, it's a lot of fun.

zedoo 2010-10-11 23:26:40

Answer 3

+2 A:

Jason S wrote:

Is there a way to do this?

Yes.

First define your grammar (I took your example of an expression parser with only the + and * and () operators):

grammar Exp;

// parser rules
parse
  :  additionExp
  ;

additionExp
  :  multiplyExp (Add multiplyExp)*
  ;

multiplyExp
  :  atomExp (Mult atomExp)* 
  ;

atomExp
  :  Number
  |  LParen additionExp RParen
  ;

// lexer rules
Add    : '+' ;
Mult   : '*' ;
LParen : '(' ;
RParen : ')' ;   
Number : ('0'..'9')+ ('.' ('0'..'9')+)? ;
Spaces : (' ' | '\t' | '\r'| '\n') {$channel=HIDDEN;} ;

If you want to let ANTLR generate a proper AST from the grammar above, you must put the following at the top of your grammar (under the grammar declaration):

options { 
  output=AST; 
}

and you must indicate what the root of each of your parse rules should be. This can be done in two ways:

by using rewrite rules;
or by placing one of the "inline tree-operators" ^ and ! after the tokens:
- ^ means: make this token the root;
- ! means: exclude this token from the AST.

Now your grammar would look like this:

grammar Exp;

options { 
  output=AST; 
}

// parser rules
parse
  :  additionExp
  ;

additionExp
  :  multiplyExp (Add^ multiplyExp)*
  ;

multiplyExp
  :  atomExp (Mult^ atomExp)* 
  ;

atomExp
  :  Number
  |  LParen! additionExp RParen!
  ;

// lexer rules
Add    : '+' ;
Mult   : '*' ;
LParen : '(' ;
RParen : ')' ;   
Number : ('0'..'9')+ ('.' ('0'..'9')+)? ;
Spaces : (' ' | '\t' | '\r'| '\n') {$channel=HIDDEN;} ;

As you can see, I made the Add and Mult roots, and excluded the parenthesis.

Now generate a lexer & parser from the grammar:

java -cp antlr-3.2.jar org.antlr.Tool Exp.g

create a little test harness:

import org.antlr.runtime.*;
import org.antlr.runtime.tree.*;
import java.util.*;

public class Main {

    private static void preOrder(CommonTree tree, int depth) {
        for(int i = 0; i < depth; i++) {
            System.out.print("- ");
        }
        System.out.println("> "+tree + " :: " + ExpParser.tokenNames[tree.getType()]);
        List children = tree.getChildren();
        if(children == null) return;
        for(Object o : children) {
            preOrder((CommonTree)o, depth+1);
        }
    }

    public static void main(String[] args) throws Exception {
        ANTLRStringStream in = new ANTLRStringStream("3 * (4 + 7 * 6) * (3 + 7 * (4 + 2))");
        ExpLexer lexer = new ExpLexer(in);
        CommonTokenStream tokens = new CommonTokenStream(lexer);
        ExpParser parser = new ExpParser(tokens);
        CommonTree tree = (CommonTree)parser.parse().getTree();
        preOrder(tree, 0);
    }
}

compile everything:

javac -cp antlr-3.2.jar *.java

and run the Main class:

// *nix/Mac OS
java -cp .:antlr-3.2.jar Main

// Windows
java -cp .;antlr-3.2.jar Main

which produces the following:

> * :: Mult
- > * :: Mult
- - > 3 :: Number
- - > + :: Add
- - - > 4 :: Number
- - - > * :: Mult
- - - - > 7 :: Number
- - - - > 6 :: Number
- > + :: Add
- - > 3 :: Number
- - > * :: Mult
- - - > 7 :: Number
- - - > + :: Add
- - - - > 4 :: Number
- - - - > 2 :: Number

As you can see, the parse rule (method) returns a CommonTree object you can use to create your own walker/visitor leaving the grammar as is.

HTH

Bart Kiers 2010-10-12 07:05:39

+1. Thanks for posting the step-by-step example, that's more or less what I needed. All the other examples I looked at had imperative actions in the .g file and I couldn't figure out what syntax was an ANTLR-ism vs. Java fragments that could be deleted.

Jason S 2010-10-12 12:20:47

p.s. what part(s) of org.antlr.stringtemplate are you using? are they recommended/necessary?

Jason S 2010-10-12 12:21:53

Hmm. Each of your tree nodes *prints* as a string... but what is the node? how would I figure out whether a node is a multiplyExp or an additionExp or something else? I'll give it a whirl through the debugger when I get a chance, but if there's a short + obvious answer, I'd greatly appreciate it.

Jason S 2010-10-12 12:26:40

@Jason, the contents of the `CommonTree` is used in its `toString(): String` method. You can get the type through its `getType(): int` method (I changed my sample slightly). And no, there is no need for the `StringTemplate` import: there were accidentally there from some other demo I posted (I removed the imports as well).

Bart Kiers 2010-10-12 12:39:26

OK, so the tree is a tree of *tokens*, not of parser expressions.

Jason S 2010-10-12 13:21:23

@Jason, yes, that is correct. Either re-build the AST with your desired tokens, or create a class that extends `CommonTree`, say `ExpTree`, add extra functionality in it (like an (abstract) `evaluate()` method) and create some sub-classes from that `ExpTree` (like `Term`, `Sum` and `Product`). More info on the latter can be found here: [Using custom AST node types](http://www.antlr.org/wiki/display/ANTLR3/Tree+construction#Treeconstruction-UsingcustomASTnodetypes)

Bart Kiers 2010-10-12 13:57:23

+1, Great job......

smink 2010-10-12 22:25:29

@smink, thank you!

Bart Kiers 2010-10-13 06:20:40

ansaurus

tags:

views:

answers:

ANTLR (or alternative): decoupling parsing from evaluation

related questions