



I need a Perl regular expression to match a string. I'm assuming only double-quoted strings, that a \" is a literal quote character and NOT the end of the string, and that a \ is a literal backslash character and should not escape a quote character. If it's not clear, some examples:

"\""    # string is 1 character long, contains dobule quote
"\\"    # string is 1 character long, contains backslash
"\\\""  # string is 2 characters long, contains backslash and double quote
"\\\\"  # string is 2 characters long, contains two backslashes

I need a regular expression that can recognize all 4 of these possibilities, and all other simple variations on those possibilities, as valid strings. What I have now is:


But that's not right - it won't match any of those except the first one. Can anyone give me a push in the right direction on how to handle this?

+7  A: 

How about this?


matches zero or more characters that aren't slashes or quotes OR two slashes OR a slash then a quote

I guess I was wrong. Cool.
Paul Tomblin
Paul: strings can be matched by regexes, however parenthesised expressions (and anything else that can nest arbitrarily deep) cannot.
This regex has false positives on strings such as """
Leon Timmermans
Cal: I think you need to double all of those backslashes. (Maybe you already did, and SO stripped them out?)
It looks fine to me. In some languages double slashed are necessary, but not in Perl.
Leon Timmermans
fyi, i did double the backslashes and SO stripped them
You need to "code-ify" the regex: either enclose it in `backticks`, or indent it four spaces and leave empty lines above and below it.
Alan Moore
@Cal: yes, that's happened to me too. The `backticks` cures that, as Alan suggested.
@Leon: By coincidence, Cal's original regex **as displayed** (i.e. with no doubled backslashes) was *also* valid Perl syntax, although it didn't do what he wanted -- e.g. it let through """ as you pointed out. The double-backslashed version now on display doesn't have that problem.
thanks for teaching me how to 'codify' :D
+10  A: 


This is almost the same as Cal's answer, but has the advantage of matching strings containing escape codes such as \n.

The ?: characters are there to prevent the contained expression being saved as a backreference, but they can be removed.

+7  A: 

A generic solution(matching all backslashed characters):

/ \A "               # Start of string and opening quote
  (?:                #  Start group
    [^\\"]           #   Anything but a backslash or a quote
    |                #  or
    \\.              #   Backslash and anything
  )*                 # End of group
  " \z               # Closing quote and end of string
Leon Timmermans
Though you may want to omit the `\A` and/or `\z` -- they imply that there can be nothing preceding or trailing the double-quoted string.
+3  A: 

See Text::Balanced. It's better than reinvent wheel. Use gen_delimited_pat to see result pattern and learn form it.

Hynek -Pichi- Vychodil

RegExp::Common is another useful tool to be aware of. It contains regexps for many common cases, included quoted strings:

use Regexp::Common;

my $str = '" this is a \" quoted string"';
if ($str =~ $RE{quoted}) {
  # do something
Rob Van Dam