tags:

views:

400

answers:

3

I have couple of ANTLR rules that I don't know how to make them work

The first rule is:

STRING_LITERAL
    :  '"' ( EscapeSequence | ~('\\'|'"') )* '"'
    ;

The second rule is:

element 
 :  name '='  math_formula  ;
math_formula
        :        '"' expression '"';

The expression is a regular C like expression

Example for the syntax

 "count" = "array[3]"

count shall be a string, while array[3] shall be an expression

My problem is that the lexer always returns both "count" and "array[3]" as Strings, and the Parser cannot recognize the expression.

I'm using java target.

EDIT: changed "variable_name" to "count".

EDIT2: explained my second attempt below:

I can detect the start of expression with '= "', but I won't be able to detect the end of expression in the Lexer, causing false detection of strings when I have 2 elements separated by ','

"count1" = "array[1]",
"count2" = "array[2]"

if I used '= "' as START_EXPRESSION, the lexer detected the quote ending the first expression, and the quote starting the second string as a string ",\n" which is obviously incorrect.

EDIT 3: Trying Syntactic predicates

I changed the rule for the STRING_LITERAL to

STRING_LITERAL  
    : (~('=') '"' ( EscapeSequence | ~('\\'|'"') )* '"')=> '"' ( EscapeSequence | ~('\\'|'"') )* '"'
    ;

Still doesn't work, also I didn't know how to produce the ~('=') in the rule itself by assigning element label to it or somthing

+1  A: 

I can't remember the syntax now, because it's been over 10 years, but one of ANTLR's key strengths is arbitrary-length lookahead with backtracking. So, whenever you see a double-quote, lookahead to see if the matches element. If it does, consume the stream as an element; if not, fall back to the STRING_LITERAL rule.


I delved back into the ANTLR reference guide, and found the syntactic predicate example. Adapting that, I think your rule would look something like this:

protected
STRING : whatever...
;
protected
EXPRESSION: whatever...
;
STRING_OR_EXPR
: ( EXPRESSION ) => EXPRESSION { $setType(EXPRESSION); }
| STRING { $setType(STRING); }
;
erickson
The problem I think is that the expression is a Parser rule, while the STRING is a lexer rule.What you described above assumes that the EXPRESSION is lexer rule which is not the case. Or May I understood something wrong (newbie in ANTLR)
bluedoze
No, you are right; I forgot about that. I think something similar can be done at the parser level. I'll look into it a bit more when I have time.
erickson
protected has been converted to fragment in ANTLR v3.
Kaleb Pederson
A: 

It's hard to tell, what the parser effectively receives, given the way it is displayed on this SO web page, and maybe given quotes you added for emphaisis. So pardon this baisc guess, but if ANTLR effectively gets

"variable_name" = "array[3]"

(note the quotes), this would ring as two STRING_LITERAL tokens separated by an equal sign for which it probably doesn't have any rule.

variable_name = "array[3]"

or maybe better

variable_name = array[3]

is what you are trying to do.

EDIT:
After clarifying that name is a STRING (defined elsewhere, no quotes), it its clear that the above guesses are "starting to" be correct. However, another problem is that, unless expression is defined with characters forbidden in a *STRING_LITTERAL*, *math_formula* will be ambiguous with it, and hence the lexer won't see an element but a "name '=' STRING_LITERAL" sequence for which it has no rules.

mjv
No, the quotes is correctly part of the syntax of the language
bluedoze
and My problem is as you described, ANTLR lexer returns two STRING_LITERAL because of the quotes
bluedoze
@unknown I see that math_formula does, but what of name ? What is the rule for name? Does it too includes quotes, and does the content within quotes differ from what would be a valid expression
mjv
@unknown Exactly: I thought there may have been things within expression that prevent it to ring as STRING_LITTERAL, but for sure "variable_name" was fine litteral...
mjv
name is of STRING type, so it does have quotes, the contents of the a name is not important as it will be taken as is.The contents of the math_formula is an expression between quotes, so it must adhere to the expression rule (C like expressions with arithmatic/logical operators)
bluedoze
So you do understand the ambiguity, right? Lexer can't see an element because the math_formula part looks just like a STRING_LITTERAL
mjv
I understand the ambiguity, but I don't know how to fix itHowever, the "expression" should be on the RHS of the '='.Can this be used to make it unambiguous?
bluedoze
A: 

What kind of screwball language are you trying to parse? I'd venture to guess that your best bet is to add some state to your lexer along these lines:

ASSIGN:
    ('=' '"')=> /* assuming whitespace doesn't exist */
     '=' {some_global_flaggy_thing=1;}
    |'='
    ;
STRING_LITERAL:
    {some_global_flaggy_thing==1}? '"' {$type=QUOTE; some_gobal_flaggy_thing=2;}
    |{some_global_flaggy_thing==2}? '"' {$type=QUOTE; some_global_flaggy_thing=0;}
    | '"' /* normal string literal stuff */ '"'
    ;

Of course, your embedded expression can't have string literals in it.
Note I'm more familiar with ANTLR2

Ellery Newcomer