I'm making a generator of LL(1) parsers, my input is a CoCo/R language specification. I've already got a Scanner generator for that input. Suppose I've got the following specification:
COMPILER 1
CHARACTERS
digit="0123456789".
TOKENS
number = digit{digit}.
decnumber = digit{digit}"."digit{digit}.
PRODUCTIONS
Expression = Term{"+"Term|"-"Term}.
Term = Factor{"*"Factor|"/"Factor}.
Factor = ["-"](Number|"("Expression")").
Number = (number|decnumber).
END 1.
So, if the parser generated by this grammar receives a word "1+1", it'd be accepted i.e. a parse tree would be found.
My question is, the character "+" was never defined in a token, but it appears in the non-terminal "Expression". How should my generated Scanner recognize it? It would not recognize it as a token.
Is this a valid input then? Should I add this terminal in TOKENS and then consider an error routine for a Scanner for it to skip it?
How does usual language specifications handle this?