views:

187

answers:

4

Hi,

I am looking for a Regular expression to match string literals in Java source code.

Is it possible?

private String Foo = "A potato";
private String Bar = "A \"car\"";

My intent is to replace all strings within another string with something else. Using:

String A = "I went to the store to buy a \"coke\"";
String B = A.replaceAll(REGEX,"Pepsi");

Something like this.

+1  A: 

You can look at different parser generators for Java, and their regular expression for the StringLiteral grammar element.

Here is an example from ANTLR:

StringLiteral
    :  '"' ( EscapeSequence | ~('\\'|'"') )* '"'
    ;
Uri
I guess you would want to avoid catching `// "hello"`
aioobe
I was always under the impression that most Java compilers pre-process comments out and only then look for everything else. But I might be wrong about this.
Uri
My problem with this answer is that I am not very confortable with grammars.
Fork
@Fork: My apologies. I assumed you were writing a parser for Java which is why you would care about the string literals....
Uri
No worries. I've tried that once, didn't go well :)
Fork
A: 

You don't say what tool you're using to do your finding (perl? sed? text editor ctrl-F etc etc). But a general regex would be:

\".*?\"

Edit: this is a quick & dirty answer, and doesn't cope with escaped quotes, comments etc

Richard
What about escaped quotes in the string?
Joe
I would imagine it's Java regex, considering the Java tag.
glowcoder
This also will match quotes in comments. This shouldn't have false negatives, but it will definitely have false positives.
Mark Peters
@glowcoder: I think the Java tag has to do with the fact that he wants to match text representing a String literal according to the Java spec, not that he wants to use Java to do the matching itself.
Mark Peters
I will use Java to do the matching. Sorry for not being clear enough.
Fork
A: 

Ok. So what you want is to search, within a String, for a sequence of characters starting and ending with double-quotes?

    String bar = "A \"car\"";
    Pattern string = Pattern.compile("\".*?\"");
    Matcher matcher = string.matcher(bar);
    String result = matcher.replaceAll("\"bicycle\"");

Note the non-greedy .*? pattern.

Wangnick
And what if the String within the String also has quotes?
Fork
Yes. What then. How do you then know where it ends? In this case you have to see to it that quotes in the inner string are somehow escaped when constructing the outer string, deal with this in your replacement string, and then unescape the result again as and when required. One possible way of escaping quotes is, e.g., to double them.
Wangnick
Wangnick
This seems to have done what I intended. Many thanks.
Fork
Nitpick: it's a double quote, not an apostrophe.
Antal S-Z
One last question, what if the the original String is like : "A \"car\" and a \"boat\"" would it match against both strings? or would it match to both strings plus what's in the middle?
Fork
A: 

Use this:

String REGEX = "\"[^\"]*\"";

Tested with

String A = "I went to the store to buy a \"coke\" and a box of \"kleenex\"";
String B = A.replaceAll(REGEX,"Pepsi");

Yields the following 'B'

I went to the store to buy a Pepsi and a box of Pepsi
tucuxi
Try it on this input: `"Double-quote is \"here->\"<-here\""`.
seh
@seh, what would you consider a correct output for your example? The original question does not demand quotes-within-quotes, un-paired quotes, or even multiple-quoted-strings, for that matter...
tucuxi
I would expect `Double-quote is "Pepsi"`, by my reading of the question, because I take a "string literal" to mean any content that's valid in the host language syntax to define a string. You're right that the original question didn't ask for the coverage of the more difficult cases, mentioning just strings within strings, but I also think that that's what makes the problem interesting. I recall Jeffrey Friedl's *Mastering Regular Expressions* was legendary for finally laying down the ultimate double-quoted string matcher, not to mention his RFC 822 email address matcher. That's the benchmark.
seh