ansaurus

Question

ANTLR rule to consume fixed number of characters

Answer 1

+1 A:

Since input like s:3:"a"b"; is valid, you can't define a String token in your lexer, unless the first and last double quote are always the start and end of your string. But I guess this is not the case.

So, you'll need a lexer rule like this:

SString
  :  's:' Int ':"' ( . )* '";'
  ;

In other words: match a s:, then an integer value followed by :" then one or more characters that can be anything, ending with ";. But you need to tell the lexer to stop consuming when the value Int is not reached. You can do that by mixing some plain code in your grammar to do so. You can embed plain code by wrapping it inside { and }. So first convert the value the token Int holds into an integer variable called chars:

SString
  :  's:' Int {chars = int($Int.text)} ':"' ( . )* '";'
  ;

Now embed some code inside the ( . )* loop to stop it consuming as soon as chars is counted down to zero:

SString
  :  's:' Int {chars = int($Int.text)} ':"' ( {if chars == 0: break} . {chars = chars-1} )* '";'
  ;

and that's it.

A little demo grammar:

grammar Test;

options {
  language=Python;
}

parse
  :  (SString {print 'parsed: [\%s]' \% $SString.text})+ EOF
  ;

SString
  :  's:' Int {chars = int($Int.text)} ':"' ( {if chars == 0: break} . {chars = chars-1} )* '";'
  ;

Int
  :  '0'..'9'+
  ;

(note that you need to escape the % inside your grammar!)

And a test script:

import antlr3
from TestLexer import TestLexer
from TestParser import TestParser

input = 's:6:"length";s:1:""";s:0:"";s:3:"end";'
char_stream = antlr3.ANTLRStringStream(input)
lexer = TestLexer(char_stream)
tokens = antlr3.CommonTokenStream(lexer)
parser = TestParser(tokens)
parser.parse()

which produces the following output:

parsed: [s:6:"length";]
parsed: [s:1:""";]
parsed: [s:0:"";]
parsed: [s:3:"end";]

Bart Kiers 2010-10-25 07:06:50

ansaurus

tags:

views:

answers:

ANTLR rule to consume fixed number of characters

related questions