ansaurus

Question

How to do Unicode escape decoding in Antlr tokenizer

Answer 1

A:

Michael wrote:

This is in Java, so representation shouldn't be an issue for Character or String.

Yeah but in Java source file, the Unicode literals look just the same... So I'm not sure what you mean.

Michael wrote:

I am just wondering how to do the replacement. If it makes it easier, say I want to replace all UNICODE_ESC fragments with the character '?' while parsing.

Okay, that can be done like this:

Token : 'x' {setText("?");} ;

where Token matches the literal x and is then rewritten with ?.

Bart Kiers 2010-10-02 07:25:44

"This is in Java" means my parser is written using Java. The language being parsed is a BlackBerry rrc file. I want to canonicalize the strings, so that the six character string "\u0067" and one character string "g" can be seen as equal. I am trying the setText("?") idea now, will let you know.

Michael Donohue 2010-10-02 22:08:06

@Michael, ah I see. Let me know if it works out for you.

Bart Kiers 2010-10-03 06:29:02

ansaurus

tags:

views:

answers:

How to do Unicode escape decoding in Antlr tokenizer

related questions