views:

64

answers:

1

I'm just playing with ANTLR and decided to try parsing JavaScript with it. But I hit the wall in dealing with optional ';' in it, where statement end is marked by newline instead. Can it be done in some straightforward way?

Just a simple grammar example that doesn't work

grammar optional_newline;
def         : statements ;
statements  : statement (statement)* ;
statement   : expression (';' | '\n') ;
expression  : ID | INT | 'var' ID '=' INT ;
ID          : ('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'0'..'9'|'_')* ;
INT         : '0'..'9'+ ;
WS          : ( ' ' | '\t' | '\r' | '\n') {$channel=HIDDEN;} ;

and I want to be able to parse this (which can be parsed by JavaScript parsers)

var i = 
10
10;

PS: I don't want to put WS in parser rules, I would be much happier if lexer just get rid of those.

A: 

i'm not sure if this will work in all cases possible in javascript, but it correctly parses your example:

grammar js;

def         : statements ;
statements  : statement (statement)* ;
statement   : expression ';'? ;
expression  : ID | INT | 'var' ID '=' INT ;
ID          : ('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'0'..'9'|'_')* ;
INT         : '0'..'9'+ ;
WS          : ( ' ' | '\t' | '\r' | '\n') {$channel=HIDDEN;} ;

alt text

stmax
But it will also parse `10 10`, which is not allowed :)
vava
could you explain where a semicolon should be required and where it should be optional? i'd have to take a look at the javascript syntax for details, might take a while..
stmax
It's more or less easy, every ';' meaning the end of sentence can be replaced by newline. But in all other cases newlines just ignored.
vava