tags:

views:

138

answers:

3
/^"((?:[^"]|\\.)*)"/

Against this string:

"quote\_with\\escaped\"characters" more

It only matches until the \", although I've clearly defined \ as an escape character (and it matches \_ and \\ fine...).

+3  A: 

It works correctly if you flip the order of your two alternatives:

/^"((?:\\.|[^"])*)"/

The problem is that otherwise the important \ character gets eaten up before it tries matching \". It worked before for \\ and \_ only because both characters in either pair get matched by your [^"].

VoteyDisciple
Simple mistake with the order of things, brilliant. Thanks!
Core Xii
A: 

Using Python with raw-string literals to ensure no further interpretation of escape sequences is taking place, the following variant does work:

import re

x = re.compile(r'^"((?:[^"\\]|\\.)*)"')

s = r'"quote\_with\\escaped\"characters" more"'

mo = x.match(s)
print mo.group()

emits "quote\_with\\escaped\"characters"; I believe that in your version (which also interrupts the match precociously if substituted in here) the "not a doublequote" subexpression ([^"]) is swallowing the backslashes that you intend to be taken as escaping the immediately-following characters. All I'm doing here is ensuring that such backslashes are NOT swallowed in this way, and, as I said, it seems to work with this change.

Alex Martelli
A: 

Not intend to confuse, just another information I've played around with. Below regexp(PCRE) try to not match wrong syntax (eg. end with \") and can use with both ' or "

/('|").*\\\1.*?[^\\]\1/

to use with php

<?php if (preg_match('/(\'|").*\\\\\1.*?[^\\\\]\1/', $subject)) return true; ?>

For:

"quote\_with\\escaped\"characters"  "aaa"
'just \'another\' quote "example\"'
"Wrong syntax \"
"No escapes, no match here"

This only match:

"quote\_with\\escaped\"characters" and
'just \'another\' quote "example\"'
noomz