views:

42

answers:

2

I might be asking a stupid/basic question but i had been confused about ANTLR AST building.

What i want to made is a kind of Boolean expression parser such that on parent nodes i have operator and its operands as children. for instance, a sentence

( ( A B C & D ) | ( E & ( F | G ) ) )

should ideally be representing

              |
             / \
            /   \
           /     \
          /       \
         &         &
        / \       / \
       /   \     /   \
      /     D   E     |
     /|\             / \
    A B C           /   \
                   F     G 

From the following grammar.

grammar Test;

options
{
   language = 'Java';
   output=AST;
}


exp    :    word (expRest^)? | '('! exp ')'! (expRest^)?  ;

expRest :    (('&'|'|'|'!'|'&!'|'|!')^) exp | (('~'^) digit+ exp);
word    :   letter letter* -> ^(letter letter*);
letter  :        '0'|'1'|'2'|'3'|'4'|'5'|'6'|'7'|'8'|'9'|'a'|'b'|'c'|'d'|'e'|'f'|'g'|'h'|'i'|'j'|'k'|'l'|'m'|'n'|'o'|'p'|'q'|'r'|'s'|'t'|'u'|'v'|'w'|'x'|'y'|'z'|'A'|'B'|'C'|'D'|'E'|'F'|'G'|'H'|'I'|'J'|'K'|'L'|'M'|'N'|'O'|'P'|'Q'|'R'|'S'|'T'|'U'|'V'|'W'|'X'|'Y'|'Z';
digit   :    '0'|'1'|'2'|'3'|'4'|'5'|'6'|'7'|'8'|'9';

The problem is, that i am getting 'A B C' as either a list(array) of nodes as children of '&'.

Is it possible to restrict it as a single string??? i.e. 'A B C' or in other words, is it possible to have multiple characters at root node in AST??? If yes then how can i achieve it?

for reference, i want to make a syntax tree of 'risk factors & current economic state'

P.S. I have also tried :

word    :   (letter letter*)^ ;

And just for a reference, I am using .NET environment.

A: 

If you want 'A B C' as a single node, then define letter to include ' ' between characters like:

letter : character (space character)*;
character : '0'..'9'|'a'..'z'|'A'..'Z';
space : ' ';

Which will include spaces as children of the letter node.

Wesley Wiser
apparently, there was no issue in writing above grammar but i am unable to debug with above grammar. Don't know what is wrong with it...(i am using ANTLRWorks 1.4 for debugging)
Umer
+2  A: 

You can insert imaginary tokens in your grammar that will be the root of "groups" of words. I don't think it's a good idea to glue the A, B and C together since you're probably need them separate, right?

I couldn't really figure out what you're exactly trying to do, so here a little demo you can (try to) get your head around:

grammar BoolExp;

options { 
  output=AST; 
}

tokens {
  MultiWord;
}

parse
  :  booleanExp EOF!
  ;

booleanExp
  :  orExp
  ;

orExp
  :  andExp ('|'^ andExp)*
  ;

andExp
  :  notExp ('&'^ notExp)*
  ;

notExp
  :  '!'^ atom
  |  atom
  ;

atom
  :  '(' booleanExp ')' -> booleanExp
  |  WORD WORD+         -> ^(MultiWord WORD+)
  |  WORD
  ;

WORD
  :  ('a'..'z' | 'A'..'Z')+
  ;

SPACE
  :  (' ' | '\t' | '\r' | '\n'){skip();}
  ;

If you generate a parser from it and test it with the input:

( ( A B C & D ) | ( E & ( F | G ) ) )

you'll get the following AST:

alt text

I did not post my (Java) test class that generated the DOT file that was used to create the AST image above since you said you're using the .NET target. If you do want to have a look at it, leave a comment and I'll post it as well.

Bart Kiers
I guess it will be work but let me try and let you know. Is there any difference in word that i used and 'WORD' that you're using??? When i used 'WORD' it was colored blue instead of red (unlike other non-terminals).
Umer
I guess you mean that some rules are colored blue and some red ANTLRWorks, right? (haven't used ANTLRWorks in a along while). Rules that start with a capital are lexer rules, the ones that start with a lowercase letter are parser rules. HTH.
Bart Kiers
ahhh, right. Great. Actually the MultiWord hint has done the trick. i select all children as space separated whenever there is a multiword text as node text and children are leaf nods. Thanks for helping me.
Umer
@umer, you're welcome.
Bart Kiers