ansaurus

Question

How to handle escape sequences in string literals in ANTLR 3?

Answer 1

+4 A:

Here is how I accomplished this in the JSON parser I wrote.

STRING   
@init{StringBuilder lBuf = new StringBuilder();}
    : 
           '"' 
           ( escaped=ESC {lBuf.append(escaped.getText());} | 
             normal=~('"'|'\\'|'\n'|'\r')     {lBuf.appendCodePoint(normal);} )* 
           '"'     
           {setText(lBuf.toString());}
    ;

fragment
ESC
    : '\\'
     ( 'n'    {setText("\n");}
     | 'r'    {setText("\r");}
     | 't'    {setText("\t");}
     | 'b'    {setText("\b");}
     | 'f'    {setText("\f");}
     | '"'    {setText("\"");}
     | '\''   {setText("\'");}
     | '/'    {setText("/");}
     | '\\'   {setText("\\");}
     | ('u')+ i=HEX_DIGIT j=HEX_DIGIT k=HEX_DIGIT l=HEX_DIGIT   {setText(ParserUtil.hexToChar(i.getText(),j.getText(),k.getText(),l.getText()));}

     )
    ;

Bruno Ranschaert 2009-02-16 10:58:24

I used this approach, but note that I had to append "getText()" instead of "escaped.getText()" at each step. The fragment writes the unescaped text to the entire STRING token, which getText() returns. For me, escaped.getText() returns the original fragment with backslashes intact.

CapnNefarious 2009-03-20 14:39:32

Answer 2

+2 A:

I needed to do just that, but my target was C and not Java. Here's how I did it based on answer #1 (and comment), in case anyone needs something alike:

QUOTE   :      '\'';
STR
@init{ pANTLR3_STRING unesc = GETTEXT()->factory->newRaw(GETTEXT()->factory); }
        :       QUOTE ( reg = ~('\\' | '\'') { unesc->addc(unesc, reg); }
                        | esc = ESCAPED { unesc->appendS(unesc, GETTEXT()); } )+ QUOTE { SETTEXT(unesc); };

fragment
ESCAPED :       '\\'
                ( '\\' { SETTEXT(GETTEXT()->factory->newStr8(GETTEXT()->factory, (pANTLR3_UINT8)"\\")); }
                | '\'' { SETTEXT(GETTEXT()->factory->newStr8(GETTEXT()->factory, (pANTLR3_UINT8)"\'")); }
                )
        ;

HTH.

2009-06-02 16:46:25

ansaurus

tags:

views:

answers:

How to handle escape sequences in string literals in ANTLR 3?

related questions