tags:

views:

202

answers:

2

In other words, I have a string like:

"anything, escaped double-quotes: \", yep" anything here NOT to be matched.

How do I match everything inside the quotes?

I'm thinking

^"((?<!\\)[^"]+)"

But my head spins, should that be a positive or a negative lookbehind? Or does it work at all?

How do I match any characters except a double-quote NOT preceded by a backslash?

+2  A: 

No lookbehind necessary:

"([^"]|\\")*"

So: match quotes, and inside them: every character except a quote ([^"]) or an escaped quote (\\"), arbitrarily many times (*).

Konrad Rudolph
As chaos mentioned, you probably also want to handle double-backslashes separately (although that wasn't specified by the OP).
Adam Rosenfield
Hah, here I go again, over-complicating the problem. Didn't think of such a simple solution at all, thanks!
Core Xii
I'd probably use '\\.' to allow the backslash to escape any single following character, which prevents the regex from being confused by backslash, backslash, (close) double quote. Clearly, you need a more complex expression in place of the dot if you want to handle octal or hex escapes, or Unicode escapes, or ...
Jonathan Leffler
I only need to escape " and \. Thanks for your help.
Core Xii
A: 

"Not preceded by" translates directly to "negative lookbehind", so you'd want (?<!\\)".

Though here's a question that may ruin your day: what about the string "foo\\"? That is, a double-quote preceded by two backslashes, where in most escaping syntaxes we would be wanting to negate the special meaning of the second backslash by preceding it with the first.

That sort of thing is kind of why regexes aren't a substitute for parsers.

chaos
I’m pretty sure though that a negative lookbehind is more expensive than my solution which uses a negative character class and an alternation. That’s a trivial case for regex engines.
Konrad Rudolph
Most likely, yeah.
chaos
What about this? `^"([^"]|(?<!\\)\\")"`
Core Xii
Great. Now three backslashes. :)
chaos
But that's already valid: `"foo\\\"` The first double-backslash escapes to `\`, leaving the `\"` at the end invalid. In the middle of the string: `"foo\\\"bar"` it again parses correctly, doesn't it?
Core Xii