ansaurus

Question

Answer 1

A:

as far as i remember (i worked with JavaCC sometime back)

the order in which you write each rule is the order in which it would be parsed, so write your rules in an order which would always generate the expression that you want.

nicko 2009-06-06 06:21:16

Answer 2

A:

Since JavaScript/EcmaScript does the same thing (that is, it contains regex literals and a divide operator that look just like those in your examples) you might want to look for an existing JavaCC grammar to learn from. I found one linked to from this blog entry, there may be others.

Laurence Gonsalves 2009-06-06 06:51:45

Answer 3

+1 A:

JavaCC will always consume the largest token available and there is no way to configure it otherwise. The only way to accomplish this is by adding a lexical state, in case say IGNORE_REGEX, that excludes the token, in this case <REGEX_LITERAL>. Then, when a token is recognized that cannot be followed by <REGEX_LITERAL> the lexical state must be switched to IGNORE_REGEX.

With the input:

var y = a/b/c

The following would occur:

<VAR> is consumed, lexical state is set to DEFAULT
<IDENTIFIER> is consumed, lexical state is set to IGNORE_REGEX
<EQUALS> is consumed, lexical state is set to DEFAULT
<IDENTIFIER> is consumed, lexical state is set to IGNORE_REGEX

At this point, there is an ambiguity in the grammar, either a <DIV> or a <REGEX_LITERAL> will be consumed. Since the lexical state is IGNORE_REGEX and that state does not match <REGEX_LITERAL> a <DIV> will be consumed.
<DIV> is consumed, lexical state is set to DEFAULT
<IDENTIFIER> is consumed, lexical state is set to IGNORE_REGEX
<DIV> is consumed, lexical state is set to DEFAULT
<IDENTIFIER> is consumed, lexical state is set to IGNORE_REGEX

Bryan Kyle 2009-06-07 05:44:19

ansaurus

tags:

views:

answers:

Handling Token Ambiguity in JavaCC

related questions