views:

253

answers:

1

I have a left recursive issue in my Antlr grammar. While I think I understand why there is a problem I am unable to think of a solution. The issue is with the last line for my datatype rule. I have included the entire grammar for you to see:

grammar Test;

options {output=AST;ASTLabelType=CommonTree;}
tokens {FUNCTION; ATTRIBUTES; CHILDREN; COMPOSITE;}

program     :   function ;
function    :   ID (OPEN_BRACKET (attribute (COMMA? attribute)*)? CLOSE_BRACKET)? (OPEN_BRACE function* CLOSE_BRACE)? SEMICOLON? -> ^(FUNCTION ID ^(ATTRIBUTES attribute*) ^(CHILDREN function*)) ;

attribute   :   ID (COLON | EQUALS)  datatype -> ^(ID datatype);

datatype    :   ID      ->  ^(STRING["id"] ID)
            |   NUMBER  ->  ^(STRING["number"] NUMBER)
            |   STRING  ->  ^(STRING["string"] STRING)
            |   BOOLEAN ->  ^(STRING["boolean"] BOOLEAN)
            |   array   ->  ^(STRING["array"] array)
            |   lookup  ->  ^(STRING["lookup"] lookup)
            |   datatype PLUS datatype -> ^(COMPOSITE datatype datatype) ;

array       :   OPEN_BOX (datatype (COMMA datatype)*)? CLOSE_BOX -> datatype* ;
lookup      :   OPEN_BRACE (ID (PERIOD ID)*) CLOSE_BRACE -> ID* ;

NUMBER
    :   ('+' | '-')? (INTEGER | FLOAT)
    ;

STRING
    :  '"' ( ESC_SEQ | ~('\\'|'"') )* '"'
    ;

BOOLEAN
    :   'true' | 'TRUE' | 'false' | 'FALSE'
    ;

ID  :   (LETTER|'_') (LETTER | INTEGER |'_')*
    ;

COMMENT
    :   '//' ~('\n'|'\r')* '\r'? '\n' {$channel=HIDDEN;}
    |   '/*' ( options {greedy=false;} : . )* '*/' {$channel=HIDDEN;}
    ;

WHITESPACE  :   (' ' | '\t' | '\r' | '\n') {$channel=HIDDEN;} ;

COLON   :   ':' ;
SEMICOLON   :   ';' ;

COMMA   :   ',' ;
PERIOD  :   '.' ;
PLUS    :   '+' ;
EQUALS  :   '=' ;   

OPEN_BRACKET    :   '(' ;
CLOSE_BRACKET   :   ')' ;

OPEN_BRACE  :   '{' ;   
CLOSE_BRACE :   '}' ;

OPEN_BOX    :   '[' ;
CLOSE_BOX   :   ']' ;

fragment
LETTER
    :   'a'..'z' | 'A'..'Z' 
    ;

fragment
INTEGER
    :   '0'..'9'+
    ;

fragment
FLOAT
    :   INTEGER+ '.' INTEGER*
    ;

fragment
ESC_SEQ
    :   '\\' ('b'|'t'|'n'|'f'|'r'|'\"'|'\''|'\\')
    ;

I am developing using Antlr works which provides a function to resolve this issue - but unfortunately it does not seem to work :s

Any help would be great.

Thanks.

EDIT:

Here is an example of the language I'm trying to implement / parse

<FunctionName> <OptionalAttributes> <OptionalChildFunctions>

So for example:

ForEach(in:[1,2,3,4,5] as:"i") {
  Switch(value:{i}) {
    Case(value:3) {
      Print(message:"This is the number 3")
    }

    Default {
       Print(message:"This isn't the number 3")
    }
 }
}
+2  A: 

Okay, this should do the trick:

grammar Test;

/************************************** PARSER **************************************/
program
    :   function EOF 
    ;

function
    :   ID (OPEN_PAREN (attribute (COMMA attribute)*)? CLOSE_PAREN)?
        (OPEN_BRACE function* CLOSE_BRACE)?
        SEMICOLON?
    ;

attribute
    :   ID (COLON | EQUALS)? expression
    ;

expression
    :   atom (PLUS atom)*
    ;

atom
    :   ID
    |   STRING
    |   BOOLEAN
    |   NUMBER
    |   array
    |   lookup
    ;

array
    :   OPEN_BOX (expression (COMMA expression)*)? CLOSE_BOX
    ;

lookup
    :   OPEN_BRACE (ID (PERIOD ID)*) CLOSE_BRACE
    ;

/************************************** LEXER **************************************/
NUMBER          :   ('+' | '-')? (INTEGER | FLOAT)
                ;

STRING          :  '"' ( ESC_SEQ | ~('\\'|'"') )* '"'
                ;

BOOLEAN         :   'true' | 'TRUE' | 'false' | 'FALSE'
                ;

ID              :   (LETTER|'_') (LETTER | INTEGER |'_')*
                ;

COMMENT         :   '//' ~('\n'|'\r')* ('\r'? '\n'| EOF) {$channel=HIDDEN;}
                |   '/*' ( options {greedy=false;} : . )* '*/' {$channel=HIDDEN;}
                ;

WHITESPACE      :   (' ' | '\t' | '\r' | '\n') {$channel=HIDDEN;} ;

COLON           :   ':' ;
SEMICOLON       :   ';' ;

COMMA           :   ',' ;
PERIOD          :   '.' ;
PLUS            :   '+' ;
EQUALS          :   '=' ;   

OPEN_PAREN      :   '(' ;
CLOSE_PAREN     :   ')' ;

OPEN_BRACE      :   '{' ;   
CLOSE_BRACE     :   '}' ;

OPEN_BOX        :   '[' ;
CLOSE_BOX       :   ']' ;

fragment 
LETTER          :   'a'..'z' | 'A'..'Z' ;
fragment
INTEGER         :   '0'..'9'+ ;
fragment
FLOAT           :   INTEGER+ '.' INTEGER* ;
fragment
ESC_SEQ         :   '\\' ('b'|'t'|'n'|'f'|'r'|'\"'|'\''|'\\') ;

Note that I've changed the name of OPEN_BRACKET and CLOSE_BRACKET into OPEN_PAREN and CLOSE_PAREN. The round ones, ( and ), are parenthesis, the square ones, [ and ], are called brackets (the ones you called boxes, but calling them boxes doesn't hurt IMO).

Bart Kiers
Thanks Bart. Yup, you seemed to have nailed it! I should be good from here! Thanks man! Much appreciated :)
Richie_W
Good to hear it Richie, and you're welcome.
Bart Kiers