views:

395

answers:

3

I am trying to parse a data file in ANTLR - it has optional whitespace exemplified by

 3 6
  97   12
 15 18

The following shows where the line starts and ends are. There is a newline at the end and there are no tabs.

^ 3 6$
^  97   12$
^ 15 18$
^

My grammar is:

lines   : line+;
line    : ws1 {System.out.println("WSOPT :"+$ws1.text+":");} 
                num1 {System.out.println("NUM1 "+$num1.text);} 
                ws2 {System.out.println("WS :"+$ws2.text+":");}
                num2 {System.out.println("NUM2 "+$num2.text);} 
                NEWLINE
    ;
num1    :  INT ;
num2    :  INT ;
ws1 : WSOPT;
ws2 : WS;

INT     : '0'..'9'+;
NEWLINE :    '\r'? '\n';
//WS    : (' '|'\t' )+ ;
WS  : (' ')+ ;
WSOPT   : (' ')* ;

which gives

line 1:0 mismatched input ' ' expecting WSOPT
WSOPT :null:
NUM1 3
WS : :
NUM2 6
line 2:0 mismatched input '   ' expecting WSOPT
WSOPT :null:
NUM1 97
WS :   :
NUM2 12
BUILD SUCCESSFUL (total time: 1 second)

(i.e. the leading WS has not been recognised and the last line has been missed).

I would like to parse lines which start without whitespace, such as:

^12    34$
^ 23 97$

but I then get errors such as:

line 1:0 required (...)+ loop did not match anything at input ' '

I'd appreciate general explanations of parsing WS in ANTLR.

EDIT @jitter has a useful answer - {ignore=WS} does not appear in the "Definitive ANTLR reference" book that I am working from so it is clearly a tricky area.

HELP still needed I have modified this to:

lines   : line line line;
line
options { ignore=WS; }
        :
                ws1  {System.out.println("WSOPT :"+$ws1.text+":");} 
                num1 {System.out.println("NUM1 "+$num1.text);} 
                ws2  {System.out.println("WS :"+$ws2.text+":");}
                num2 {System.out.println("NUM2 "+$num2.text);} 
                NEWLINE
    ;

but get the error:

illegal option ignore

EDIT apparently this has been removed from V3: http://www.antlr.org/pipermail/antlr-interest/2007-February/019423.html

+2  A: 

Check Lexical Analysis with ANTLR and then search the part which starts with this heading

Ignoring whitespace in the lexer

You need to use the { ignore=WS; } rule

jitter
Thanks - I will try this and report back. FWIW there are cases where the exact formatting also matters so I hope I can switch.
peter.murray.rust
**EDIT** It appears this is not available in V3
peter.murray.rust
A: 

I have managed to get this working using lexer constructs such as:

WS  :   (' ')+ {skip();};

WSOPT   :       (' ')* {skip();};

but not in the NEWLINE. Then in the parser constructs such as:

num1 num2 NEWLINE;

The key was to strip all WS in the lexer except the NEWLINE.

peter.murray.rust
+3  A: 
WS : (' ' | '\t')+
     {$channel = HIDDEN;}
   ;
280Z28