I'm attempting to write a parser in JavaCC that can recognize a language that has some ambiguity at the token level. In this particular case the language supports the "/" token by itself as a division operator while it also supports regular expression literals.
Consider the following JavaCC grammar:
TOKEN :
{
...
< VAR : "var" > |
< DIV : "/" > |
< EQUALS : "=" > |
< SEMICOLON : ";" > |
...
}
TOKEN :
{
< IDENTIFIER : <IDENTIFIER_START> (<IDENTIFIER_START> | <IDENTIFIER_CHAR>)* > |
< #IDENTIFIER_START : ( [ "$","_","A"-"Z","a"-"z" ] )> |
< #IDENTIFIER_CHAR : ( [ "$","_","A"-"Z","a"-"z","0"-"9" ] ) > |
< REGEX_LITERAL : ("/" <REGEX_BODY> "/" ( <REGEX_FLAGS> )? ) > |
< #REGEX_BODY : ( <REGEX_FIRST_CHAR> <REGEX_CHARS> ) > |
< #REGEX_CHARS : ( <REGEX_CHAR> )* > |
< #REGEX_FIRST_CHAR : ( ~["\r", "\n", "*", "/", "\\"] | <BACKSLASH_SEQUENCE> ) > |
< #REGEX_CHAR : ( ~[ "\r", "\n", "/", "\\" ] | <BACKSLASH_SEQUENCE> ) > |
< #BACKSLASH_SEQUENCE : ("\\" ~[ "\r", "\n"] ) > |
< #REGEX_FLAGS : ( <IDENTIFIER_CHAR> )* >
}
Given the following code:
var y = a/b/c;
Two different sets of tokens could be generated. The token stream should be either:
<VAR> <IDENTIFIER> <EQUALS> <IDENTIFIER> <DIV> <IDENTIFIER> <DIV> <SEMICOLON>
or
<VAR> <IDENTIFIER> <EQUALS> <IDENTIFIER> <REGEX_LITERAL> <SEMICOLON>
How can I ensure that that TokenManager generates the token stream that I expect for this case?