tags:

views:

61

answers:

1

I've created a antlr grammar using AntlrWorks, and have created a localization tool for internal use. I would like to convert unicode escape sequences into the actual Java character while parsing, but am unsure of the best way to do this. Here are the token definitions in my grammar. Is there some way to specify an action for the fragment UNICODE_ESC, that would return the character, instead of the six character escape sequence?

ID  :   ('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'0'..'9'|'_')*
    ;

INT :   '0'..'9'+
    ;

COMMENT
    :   '//' ~('\n'|'\r')* '\r'? '\n' {$channel=HIDDEN;}
    |   '/*' ( options {greedy=false;} : . )* '*/' {$channel=HIDDEN;}
    ;

WS  :   ( ' '
        | '\t'
        | '\r'
        | '\n'
        ) {$channel=HIDDEN;}
    ;

STRING
    :  '"' ( ESC_SEQ | ~('\\'|'"') )* '"'
    ;

fragment
HEX_DIGIT : ('0'..'9'|'a'..'f'|'A'..'F') ;

fragment
ESC_SEQ
    :   '\\' ('b'|'t'|'n'|'f'|'r'|'\"'|'\''|'\\')
    |   UNICODE_ESC
    |   OCTAL_ESC
    ;

fragment
OCTAL_ESC
    :   '\\' ('0'..'3') ('0'..'7') ('0'..'7')
    |   '\\' ('0'..'7') ('0'..'7')
    |   '\\' ('0'..'7')
    ;

fragment
UNICODE_ESC
    :   '\\' 'u' HEX_DIGIT HEX_DIGIT HEX_DIGIT HEX_DIGIT
    ;
A: 

Michael wrote:

This is in Java, so representation shouldn't be an issue for Character or String.

Yeah but in Java source file, the Unicode literals look just the same... So I'm not sure what you mean.

Michael wrote:

I am just wondering how to do the replacement. If it makes it easier, say I want to replace all UNICODE_ESC fragments with the character '?' while parsing.

Okay, that can be done like this:

Token : 'x' {setText("?");} ;

where Token matches the literal x and is then rewritten with ?.

Bart Kiers
"This is in Java" means my parser is written using Java. The language being parsed is a BlackBerry rrc file. I want to canonicalize the strings, so that the six character string "\u0067" and one character string "g" can be seen as equal. I am trying the setText("?") idea now, will let you know.
Michael Donohue
@Michael, ah I see. Let me know if it works out for you.
Bart Kiers