tags:

views:

83

answers:

2

I need to be able to match a certain string ('[' then any number of equals signs or none then '['), then i need to match a matching close bracket (']' then the same number of equals signs then ']') after some other match rules. ((options{greedy=false;}:.)* if you must know). I have no clue how to do this in ANTLR, how can i do it?

An example: I need to match [===[whatever arbitrary text ]===] but not [===[whatever arbitrary text ]==].

I need to do it for an arbitrary number of equals signs as well, so therein lies the problem: how do i get it to match an equal number of equals signs in the open as in the close? The supplied parser rules so far dont seem to make sense as far as helping.

A: 

Your tags mention lexing, but your question itself doesn't. What you're trying to do is non-regular, so I don't think it can be done as part of lexing (though I don't remember if ANTLR's lexer is strictly regular -- it's been a couple of years since I last used ANTLR).

What you describe should be possible in parsing, however. Here's the grammar for what you described:

thingy : LBRACKET middle RBRACKET;
middle : EQUAL middle EQUAL
       | LBRACKET RBRACKET;
Laurence Gonsalves
+2  A: 

You can't easely write a lexer for it, you need parsing rules. Two rules should be sufficient. One is responsible for matching the braces, one for matching the equal signs.

Something like this:

braces : '[' ']'
       | '[' equals ']'
       ;

equals : '=' equals '='
       | '=' braces '='
       ;

This should cover the use case you described. Not absolute shure but maybe you have to use a predicate in the first rule of 'equals' to avoid ambiguous interpretations.

Edit:

It is hard to integrate your greedy rule and at the same time avoid a lexer context switch or something similar (hard in ANTLR). But if you are willing to integrate a little bit of java in your grammer you can write an lexer rule.

The following example grammar shows how:

grammar TestLexer;

SPECIAL : '[' { int counter = 0; } ('=' { counter++; } )+ '[' (options{greedy=false;}:.)* ']' ('=' { counter--; } )+ { if(counter != 0) throw new RecognitionException(input); } ']';

ID  :   ('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'0'..'9'|'_')*
    ;

WS  :   ( ' '
        | '\t'
        | '\r'
        | '\n'
        ) {$channel=HIDDEN;}
    ;

rule    : ID
    | SPECIAL
    ;
Arne
how does this support less than two equals signs in a bracket?
RCIX
Please specify what case is not covered in your opinion, then I can tell you the sequence of rule invocations.e.g. [=[]=]=> [ equals ] (braces rule 2)=> [= braces =] (equals rule 2)=> [=[]=] (braces rule 1)
Arne
But how can i insert `(options{greedy=false;}:.)*` in there? I understand how the rule works now but i'm not understanding how to fit that in.
RCIX