views:

461

answers:

2

Hello, I have the following Antlr grammar:

grammar MyGrammar;

doc : intro planet;
intro   : 'hi';
planet  : 'world';
MLCOMMENT 
    : '/*' ( options {greedy=false;} : . )* '*/' { $channel = HIDDEN; };
WHITESPACE : ( 
    (' ' | '\t' | '\f')+
  |
    // handle newlines
    ( '\r\n'  // DOS/Windows
      | '\r'    // Macintosh
      | '\n'    // Unix
    )
    )
 { $channel = HIDDEN; };

In the ANTLRWorks 1.2.3 interpreter, the inputs hi world,hi/**/world and hi /*A*/ world work, as expected.

However, the input hiworld, which shouldn't work, is also accepted. How do I make hiworld fail? How do I force at least one whitespace(or comment) between "hi" and "world"?

Note that I've used only MLCOMMENT and WHITESPACE in this example to simplify, but other kinds of comments would be supported.

A: 

One way to make the string hiworld fail is to use a validating semantic predicate that is guaranteed to fail, as follows:

doc:      intro planet;
failure : 'hiworld' { false }?;
intro   : 'hi';
planet  : 'world';
// rest of grammar omitted
Pourquoi Litytestdata
Very interesting, but if I added every single possible failure case to more complex grammars, the number of failure situations would grow exponentially.
luiscubal
+5  A: 

You need to create a general ID token. Since the lexer builds the longest token it can, it would see the input "hiworld" as a single word since it's longer than "hi" or "world" by themselves. Such a rule might look like:

ID : ('a'..'z' | 'A'..'Z')+;

As an example, that's exactly how parsers for programming languages separate the "do" keyword from "double" (keyword type, starts with 'do') or "done" (variable name).

280Z28