tags:

views:

134

answers:

2

And by string literals I mean those containing \123-like characters too. I've written something but I don't know if it's perfect:

<STRING> {
  \"                             { yybegin(YYINITIAL); 
                                   return new Token(TokenType.STRING,string.toString()); }
  \\[0-3][0-7][0-7]              { string.append( yytext() ); }
  \\[0-3][0-7]                   { string.append( yytext() ); }
  \\[0-7]                        { string.append( yytext() ); }
  [^\n\r\"\\]+                   { string.append( yytext() ); }
  \\t                            { string.append('\t'); }
  \\n                            { string.append('\n'); }

  \\r                            { string.append('\r'); }
  \\\"                           { string.append('\"'); }
  \\                             { string.append('\\'); }
}

In fact, I know this isn't perfect, since for the three lines parsing \ddd-like characters, I don't put the character itself in the string, but its representation instead. I may try to convert it using Character methods, but then maybe I'm not exhaustive, maybe there are other escape sequences I didn't handle.... so if there is a canonical jflex file for that it would be perfect.

A: 

When looking at the JLS, paragraph 3.10.5 String Literals, it defines String literals as follows:

    StringLiteral:
      " StringCharacters* "

    StringCharacters:
      StringCharacter
      StringCharacters StringCharacter

    StringCharacter:
      InputCharacter but not " or \
      EscapeSequence

where an EscapeSequence is defined in 3.10.6:

    EscapeSequence:
      \ b            /* \u0008: backspace BS */
      \ t            /* \u0009: horizontal tab HT */
      \ n            /* \u000a: linefeed LF */
      \ f            /* \u000c: form feed FF */
      \ r            /* \u000d: carriage return CR */
      \ "            /* \u0022: double quote " */
      \ '            /* \u0027: single quote ' */
      \ \            /* \u005c: backslash \ */
      OctalEscape    /* \u0000 to \u00ff: from octal value */

    OctalEscape:
      \ OctalDigit
      \ OctalDigit OctalDigit
      \ ZeroToThree OctalDigit OctalDigit

    OctalDigit: one of
      0 1 2 3 4 5 6 7

    ZeroToThree: one of
      0 1 2 3

Note that \' is also a valid escape sequence in a String literal and at the moment, you still miss a couple of escape sequences. You may also want to account for Unicode escapes that can be present in Java source files (and thus in String literals as well): \u HEX HEX HEX HEX where HEX is one of 0-9 | A-F.

Bart Kiers
A: 

Yes. Download JFlex an see the files examples/java/java.flex. It has the definitions in JFlex syntax for all of the lexical components of the Java language.

Cheers.

mrrtnn