I'm trying to match chunks of JS code and extract string literals that contain a given keyword using Java.
After trying to come up with my own regexp to do this, I ended up modifying this generalized string-literal matching regexp (Pattern.COMMENTS used when building the patterns in Java):
(["']) (?:\\?+.)*? \1
to the following
(["']) (?:\\?+.)*? keyword (?:\\?+.)*? \1
The test cases:
var v1 = "test";
var v2 = "testkeyword";
var v3 = "test"; var v4 = "testkeyword";
The regexp correctly doesn't match line 1 and correctly matches line 2.
However, in line 3, instead of just matching "testkeyword", it matches the chunk
"test"; var v4 = "testkeyword"
which is wrong - the regexp matched the first double quote and did not terminate at the second double quote, going all the way till the end of line.
Does anyone have any ideas on how to fix this?
PS: Please keep in mind that the Regexp has to correctly handle escaped single and double quote characters inside of string literals (which the generalized matcher already did).