What JavaCC syntax does implement grammar that can parse these kind of lines:
[b]content[/b]
content[/b]
[b]content
Although the JavaCC parser needs to parse all lines, it must distinguish correct and incorrect tagging behavior.
Correct tags are like the 1st line, they have an open and close tag. When the tags are matched this will output a bold formated text.
Incorrect tags are like line's 2 and 3, they have no matching open or close tag. When these occure, they are written to the output as-is and will not be interpreted as tags.
I have tried the JavaCC code below (LOOKAHEAD = 999999). Problem is, this syntax will always match everything as invalidTag() instead of bold(). How can I make sure that the JavaCC parser will match bold() when ever possible?
String parse() :
{}
{
body() <EOF>
{ return buffer; }
}
void body() :
{}
{
(content())*
}
void content() :
{}
{
(text()|bold()|invalidTag)
}
void bold() :
{}
{
{ buffer += "<b>"; }
<BOLDSTART>(content())*<BOLDEND>
{ buffer += "</b>"; }
}
void invalidTag() :
{
}
{
<BOLDSTART> | <BOLDEND>
{ // todo: just output token
}
}
TOKEN :
{
<TEXT : (<LETTER>|<DIGIT>|<PUNCT>|<OTHER>)+ >
|<BOLDSTART : "[b]" >
|<BOLDEND : "[/b]" >
|<LETTER : ["a"-"z","A"-"Z"] >
|<DIGIT : ["0"-"9"] >
|<PUNCT : [".", ":", ",", ";", "\t", "!", "?", " "] >
|<OTHER : ["*", "'", "$", "|", "+", "(", ")", "{", "}", "/", "%", "_", "-", "\"", "#", "<", ">", "=", "&", "\\"] >
}