OK, it seems that you need a technique called lookahead here. Here is a very good tutorial:
Lookahead tutorial
My first attempt was wrong then, but as it works for distinct tokens which define a context I'll leave it here (Maybe it's useful for somebody ;o)).
Let's say we want to have some kind of markup language. All we want to "markup" are:
- Expressions consisting of letters (abc...zABC...Z) and whitespaces --> words
- Expressions consisting of numbers (0-9) --> numbers
We want to enclose words in tags and numbers in tags. So if i got you right that is what you want to do: If you're in the word context (between word tags) the compiler should expect letters and whitespaces, in the number context it expects numbers.
I created the file WordNumber.jj which defines the grammar and the parser to be generated:
options
{
LOOKAHEAD= 1;
CHOICE_AMBIGUITY_CHECK = 2;
OTHER_AMBIGUITY_CHECK = 1;
STATIC = true;
DEBUG_PARSER = false;
DEBUG_LOOKAHEAD = false;
DEBUG_TOKEN_MANAGER = false;
ERROR_REPORTING = true;
JAVA_UNICODE_ESCAPE = false;
UNICODE_INPUT = false;
IGNORE_CASE = false;
USER_TOKEN_MANAGER = false;
USER_CHAR_STREAM = false;
BUILD_PARSER = true;
BUILD_TOKEN_MANAGER = true;
SANITY_CHECK = true;
FORCE_LA_CHECK = false;
}
PARSER_BEGIN(WordNumberParser)
/** Model-tree Parser */
public class WordNumberParser
{
/** Main entry point. */
public static void main(String args []) throws ParseException
{
WordNumberParser parser = new WordNumberParser(System.in);
parser.Input();
}
}
PARSER_END(WordNumberParser)
SKIP :
{
" "
| "\n"
| "\r"
| "\r\n"
| "\t"
}
TOKEN :
{
< WORD_TOKEN : (["a"-"z"] | ["A"-"Z"] | " " | "." | ",")+ > |
< NUMBER_TOKEN : (["0"-"9"])+ >
}
/** Root production. */
void Input() :
{}
{
( WordContext() | NumberContext() )* < EOF >
}
/** WordContext production. */
void WordContext() :
{}
{
"<WORDS>" (< WORD_TOKEN >)+ "</WORDS>"
}
/** NumberContext production. */
void NumberContext() :
{}
{
"<NUMBER>" (< NUMBER_TOKEN >)+ "</NUMBER>"
}
You can test it with a file like that:
<WORDS>This is a sentence. As you can see the parser accepts it.</WORDS>
<WORDS>The answer to life, universe and everything is</WORDS><NUMBER>42</NUMBER>
<NUMBER>This sentence will make the parser sad. Do not make the parser sad.</NUMBER>
The Last line will cause the parser to throw an exception like this:
Exception in thread "main" ParseException: Encountered " <WORD_TOKEN> "This sentence will make the parser sad. Do not make the parser sad. "" at line 3, column 9.
Was expecting:
<NUMBER_TOKEN> ...
That is because the parser did not find what it expected.
I hope that helps.
Cheers!
P.S.: The parser can't "be" inside a token as a token is a terminal symbol (correct me if I'm wrong) which can't be replaced by production rules any further. So all the context aspects have to be placed inside a production rule (non terminal) like "WordContext" in my example.