views:

249

answers:

1

I'm working with antlr 3.2. I have a simple grammar that consists of atoms (which are either the characters "0" or "1"), and a rule which accumulates a comma separated list of them into a list.

When I pass in "00" as input, I don't get an error, which surprises me because this should not be valid input:

C:\Users\dan\workspace\antlrtest\test>java -cp antlr-3.2.jar org.antlr.Tool Test.g
C:\Users\dan\workspace\antlrtest\test>javac -cp antlr-3.2.jar *.java
C:\Users\dan\workspace\antlrtest\test>java -cp .;antlr-3.2.jar TestParser
[0]

How can I force a error to be generated in this case? It's particularly puzzling because when I use the interpreter in ANTLRWorks on this input, it does show a NoViableAltException.

I find that if I change the grammar to require, say, a semicolon at the end, an error is generated, but that solution isn't available to me in the real grammar I am working on.

Here is the grammar, which is self-contained and runnable:

grammar Test;

@parser::members {
  public static void main(String[] args) throws Exception {
    String text = "00";
    ANTLRStringStream in = new ANTLRStringStream(text);
    TestLexer lexer = new TestLexer(in);
    CommonTokenStream tokens = new CommonTokenStream(lexer);
    System.out.println(new TestParser(tokens).mainRule());
  }
}

mainRule returns [List<String> words]
@init{$words = new ArrayList<String>();}
  :  w=atom {$words.add($w.text);} (',' w=atom {$words.add($w.text);} )*
  ;


atom: '0' | '1';

WS
  :  ( '\t' | ' ' | '\r' | '\n'| '\u000C' )+ { $channel = HIDDEN; }
  ;
+2  A: 

After your mainRule, you should add a EOF token, otherwise ANTLR will stop parsing when there is no valid token to be matched.

Also, the atom rule should really be a lexer rule instead of a parser rule (lexer rules start with a capital).

Try this instead:

grammar Test;

@parser::members {
  public static void main(String[] args) throws Exception {
    String text = "0,1  ,  1  , 0,1";
    ANTLRStringStream in = new ANTLRStringStream(text);
    TestLexer lexer = new TestLexer(in);
    CommonTokenStream tokens = new CommonTokenStream(lexer);
    System.out.println(new TestParser(tokens).mainRule());
  }
}

mainRule returns [List<String> words]
@init{$words = new ArrayList<String>();}
  :  w=Atom {$words.add($w.text);} (',' w=Atom {$words.add($w.text);} )* EOF
  ;

Atom
  :  '0' | '1'
  ;

WS
  :  ( '\t' | ' ' | '\r' | '\n'| '\u000C' )+ { $channel = HIDDEN; }
  ;

EDIT

To clarify: as you already found out, EOF is not mandatory. It will only cause the parser to go through the entire input. A NoViableAltException is only thrown when the lexer stumbles upon a token/char that is not handled by your lexer grammar. Since you define three tokens in your grammar (0, 1 and ,) and your input, "00", does not contain any characters not handled by your grammar, no NoViableAltException is thrown. If you change your input to something like "0?0", then a NoViableAltException will pop up.

Since your parser finds the first 0 and then did not find a ,, it simply stops parsing since you did not "tell" it to parse all the way to the end of the file.

Hope that clarifies things. If not, let me know.

Bart Kiers
"... ANTLR will stop parsing when there is no valid token to be matched." Thanks, Bart! But shouldn't it report an error when there is text remaining, but no valid tokens? Is this a peculiarity of ANTLR, or is the right thing to do for some reason?
Dan Becker
additionally, this seems to imply that you should always include an EOF at the end of the main rules for a parser, but I've seen plenty of ANTLR examples where this isn't done.
Dan Becker
One mode comment, I found a message on antlr-interest that is relevant here. It sounds like you don't *always* need an EOF terminating your grammar, but in my case I think I do. The link: http://www.antlr.org/pipermail/antlr-interest/2009-January/032219.html
Dan Becker
@Dan, see my **EDIT**.
Bart Kiers