tags:

views:

232

answers:

1

I'm attemping to learn language parsing for fun...

I've created a ANTLR grammar which I believe will match a simple language I am hoping to implement. It will have the following syntax:

<FunctionName> ( <OptionalArguments>+) {
     <OptionalChildFunctions>+
 }

Actual Example:

ForEach(in:[1,2,3,4,5] as:"nextNumber") {
   Print(message:{nextNumber})
}

I believe I have the grammar working correctly to match this construct, and now I am attemping to build an Abstract Syntax Tree for the language.

Firstly, I must admit I'm not exactly sure HOW this tree should look. Secondly, I'm at a complete loss how to do this in my Antlr grammar...I've been trying without much success for hours.

This is the current idea I'm going with for the tree - Please bare with me! :)

                   FunctionName
                  /          \
           Attributes         \
               / \          /  \ 
            ID    /\    ChildFunctions
           / \   ID etc
          /   \
  Attribute  AttributeValue
        Type

This is my current Antlr grammar file:

grammar Test;

options {output=AST;ASTLabelType=CommonTree;}

program : function ;
function : ID (OPEN_BRACKET (attribute (COMMA? attribute)*)? CLOSE_BRACKET)? (OPEN_BRACE function* CLOSE_BRACE)?;

attribute : ID COLON datatype;

datatype : NUMBER | STRING | BOOLEAN | array | lookup ;
array  :  OPEN_BOX (datatype (COMMA datatype)* )? CLOSE_BOX ;
lookup  : OPEN_BRACE (ID (PERIOD ID)*) CLOSE_BRACE;

NUMBER
 : ('+' | '-')? (INTEGER | FLOAT)
 ;

STRING
    :  '"' ( ESC_SEQ | ~('\\'|'"') )* '"'
    ;

BOOLEAN
 : 'true' | 'TRUE' | 'false' | 'FALSE'
 ;

ID  : (LETTER|'_') (LETTER | INTEGER |'_')*
    ;

COMMENT
    :   '//' ~('\n'|'\r')* '\r'? '\n' {$channel=HIDDEN;}
    |   '/*' ( options {greedy=false;} : . )* '*/' {$channel=HIDDEN;}
    ;

WHITESPACE : (' ' | '\t' | '\r' | '\n') {$channel=HIDDEN;} ;

COLON : ':' ;
COMMA : ',' ;
PERIOD  :  '.' ;

OPEN_BRACKET : '(' ;
CLOSE_BRACKET : ')' ;

OPEN_BRACE : '{' ; 
CLOSE_BRACE : '}' ;

OPEN_BOX : '[' ;
CLOSE_BOX : ']' ;

fragment
LETTER
 : 'a'..'z' | 'A'..'Z' 
 ;

fragment
INTEGER
 : '0'..'9'+
 ;

fragment
FLOAT
 : INTEGER+ '.' INTEGER*
 ;

fragment
ESC_SEQ
    :   '\\' ('b'|'t'|'n'|'f'|'r'|'\"'|'\''|'\\')
    ;

ANY help / advice would be great. I've tried reading dozens of tutorials and nothing about the AST generation seems to stick :(

+4  A: 

Step 1 is to make the tree look like the little graph that you posted. Right now, you don't have any tree construction operators, so you're going to end up with a flat list.

See tree construction on the antlr.org website.

You can use ANTLRWorks to see what your getting for a parse tree and AST. Start adding tree construction operators and watch how things change.

EDIT / Additional Info:

Here's a process you can follow to give you a rough idea of how to do it:

  1. Download ANTLRWorks and use it's graphing facilities. You will definitely want to see the parse tree and the AST before and after you make changes. Once you understand how everything works, then you can use any IDE or editor you want.
  2. There are two basic operators for tree construction - The exclamation mark ! which tells the compiler to not place the node within the AST, and the carot ^, which tells ANTLR to make something the root node. Start by going through each non-terminal rule and deciding which elements don't need to be in the AST. For example, you don't need commas or parenthesis. Once you have all the information you can populate the a structure (or create your own AST structure) that provides all the information. Commas don't help any more, so add a ! to them. For example:

    function: ID (OPEN_BRACKET! (attribute (COMMA!? attribute)*)? CLOSE_BRACKET!)? (OPEN_BRACE! function* CLOSE_BRACE!)?;

  3. Take a look at the AST in ANTLRWorks before and after. Compare.

  4. Now decide which element should be the root node. It looks like you want ID to be the root node, so add a ^ after ID and compare in ANTLRWorks.

Here's a few changes that bring it closer to what I think you want:

program : function ;
function : ID^ (OPEN_BRACKET! attributeList? CLOSE_BRACKET!)? (OPEN_BRACE! function* CLOSE_BRACE!)?;
attributeList:  (attribute (COMMA!? attribute)*);
attribute : ID COLON! datatype;
datatype : NUMBER | STRING | BOOLEAN | array | lookup ;
array  :  OPEN_BOX! (datatype^ (COMMA! datatype)* )? CLOSE_BOX!;
lookup  : OPEN_BRACE! (ID (PERIOD! ID)*) CLOSE_BRACE!;

With that under your belt, now go look at some of the tutorials.

Kaleb Pederson
This is the part which I'm lost at I'm afraid. I've read the tree construction page and didn't really understand it :(
Richie_W
Thanks for the edit. That added a lot of useful information.
Richie_W
Really great explanation. I'm also a beginner with ANTLR and I believe the site is somehow a way to tell you to buy the book. Thanks
Iulian Şerbănoiu