views:

435

answers:

1

I know there are already many questions like mine but I found no answer which works in Java. So I write a new question.

I have text files with content like this:

key1 = "This is a \"test\" text with escapes using '\\' characters";
key2 = 'It must work with \'single\' quotes and "double" quotes';

I need a regular expression which matches the values in the double-quotes (or single-quotes). This regular expression must support the escaped quotes and escaped backslashes. The regular expression must work with Java standard Pattern/Matcher classes.

A: 

Try this regular expression:

'([^\\']+|\\([btnfr"'\\]|[0-3]?[0-7]{1,2}|u[0-9a-fA-F]{4}))*'|"([^\\"]+|\\([btnfr"'\\]|[0-3]?[0-7]{1,2}|u[0-9a-fA-F]{4}))*"

And as a string literal:

"'([^\\\\']+|\\\\([btnfr\"'\\\\]|[0-3]?[0-7]{1,2}|u[0-9a-fA-F]{4}))*'|\"([^\\\\\"]+|\\\\([btnfr\"'\\\\]|[0-3]?[0-7]{1,2}|u[0-9a-fA-F]{4}))*\""
Gumbo
Seems to work so far, thanks.
kayahr
Crikey, that's a regex and a half. Did you just come up with this, or is it something you've used for a period of time? (E.g., how well tested would you say it is?)
T.J. Crowder
@OP: This looks like it's tailored to process Java strings and similar (it handles Unicode escapes like `\u1234`, for instance, and the usual Java `\f`, `\t` and such). Just mentioning it in case your source data is slightly different from that, since you didn't actually say the strings were in the Java style, just that they may have backslash-escaped quotes and backslashes. In fact, it sounds to me like your strings are JavaScript (which have very nearly the same syntax as Java strings, so you're probably fine).
T.J. Crowder
Yes, the strings are JavaScript. But I parse them with Java.
kayahr