views:

224

answers:

3

How to handle the case where the token 'for' is used in two different situations in the language to parse? Such as statement and as a "parameter" as the following example:

echo for print example
for i in {0..10..2}
  do
     echo "Welcome $i times"
 done

Output:

for print example
Welcome 0 times
Welcome 2 times
Welcome 4 times
Welcome 6 times
Welcome 8 times
Welcome 10 times

Thanks.

A: 

Well, it's pretty easy, most grammars use something like this:

TOKEN_REF
    :   'A'..'Z' ('a'..'z'|'A'..'Z'|'_'|'0'..'9')*
    ;

So when referring to a print statement you would do something like:

'print' (TOKEN_REF)*

And with a for statement you just explicity state 'for' such as:

'for' INT 'in' SOMETHING
wvd
No, I can't see that working. Let's say there is a `FOR` rule that looks like `FOR : 'for';`. Now if `FOR` gets tokenized before `TOKEN_REF`, then `TOKEN_REF` could never contain the characters `for` (and therefore `'print' (TOKEN_REF)*` can never contain `'for'`). But if `TOKEN_REF` gets tokenized before `FOR`, then the `FOR` rule will never be matched since `TOKEN_REF` will always match `for`. Did you try what you suggested? If so, did it work and would you care to post it? Thanks.
Bart Kiers
A: 

The only way I see how you could go about doing this, is define an Echo rule in your lexer grammar that matches the characters echo followed by all other characters except \r and \n:

Echo
  :  'echo' ~('\r' | '\n')+
  ;

and make sure that rule is before the rule that matches identifiers and keywords (like for).

A quick demo of a possible start would be:

grammar Test;

parse
  :  (echo | for)*
  ;

echo
  :  Echo (NewLine | EOF)
  ;

for 
  :  For Identifier In range NewLine
     Do NewLine
     echo
     Done (NewLine | EOF)
  ;

range
  :  '{' Integer '..' Integer ('..' Integer)? '}'
  ;

Echo
  :  'echo' ~('\r' | '\n')+
  ;

For  : 'for';
In   : 'in';
Do   : 'do';
Done : 'done';

Identifier
  :  ('a'..'z' | 'A'..'Z' | '_') ('a'..'z' | 'A'..'Z' | '_' | '0'..'9')*
  ;

Integer
  :  '0'..'9'+
  ;

NewLine
  :  '\r' '\n'
  |  '\n'
  |  '\r'
  ;

Space
  :  (' ' | '\t') {skip();}
  ;

If you'd parse the input:

echo for print example
for i in {0..10..2}
do
  echo "Welcome $i times"
done
echo the end for now!

with it, it would look like:

alt text

(I had to rotate the image a bit, otherwise it wouldn't be visible at all!)

HTH.

Bart Kiers
A: 

In order to do that you need to use a semantic predicate to only take that lexer rule when it really is the for keyword.

Details are available on the keywords as identifiers page on the ANTLR wiki.

Kaleb Pederson