views:

1877

answers:

5

I need a Perl regular expression to match a string. I'm assuming only double-quoted strings, that a \" is a literal quote character and NOT the end of the string, and that a \ is a literal backslash character and should not escape a quote character. If it's not clear, some examples:

"\""    # string is 1 character long, contains dobule quote
"\\"    # string is 1 character long, contains backslash
"\\\""  # string is 2 characters long, contains backslash and double quote
"\\\\"  # string is 2 characters long, contains two backslashes

I need a regular expression that can recognize all 4 of these possibilities, and all other simple variations on those possibilities, as valid strings. What I have now is:

/".*[^\\]"/

But that's not right - it won't match any of those except the first one. Can anyone give me a push in the right direction on how to handle this?

+7  A: 

How about this?

/"([^\\"]|\\\\|\\")*"/

matches zero or more characters that aren't slashes or quotes OR two slashes OR a slash then a quote

Cal
I guess I was wrong. Cool.
Paul Tomblin
Paul: strings can be matched by regexes, however parenthesised expressions (and anything else that can nest arbitrarily deep) cannot.
j_random_hacker
This regex has false positives on strings such as """
Leon Timmermans
Cal: I think you need to double all of those backslashes. (Maybe you already did, and SO stripped them out?)
j_random_hacker
It looks fine to me. In some languages double slashed are necessary, but not in Perl.
Leon Timmermans
fyi, i did double the backslashes and SO stripped them
Cal
You need to "code-ify" the regex: either enclose it in `backticks`, or indent it four spaces and leave empty lines above and below it.
Alan Moore
@Cal: yes, that's happened to me too. The `backticks` cures that, as Alan suggested.
j_random_hacker
@Leon: By coincidence, Cal's original regex **as displayed** (i.e. with no doubled backslashes) was *also* valid Perl syntax, although it didn't do what he wanted -- e.g. it let through """ as you pointed out. The double-backslashed version now on display doesn't have that problem.
j_random_hacker
thanks for teaching me how to 'codify' :D
Cal
+10  A: 

/"(?:[^\\"]|\\.)*"/

This is almost the same as Cal's answer, but has the advantage of matching strings containing escape codes such as \n.

The ?: characters are there to prevent the contained expression being saved as a backreference, but they can be removed.

j_random_hacker
+7  A: 

A generic solution(matching all backslashed characters):

/ \A "               # Start of string and opening quote
  (?:                #  Start group
    [^\\"]           #   Anything but a backslash or a quote
    |                #  or
    \\.              #   Backslash and anything
  )*                 # End of group
  " \z               # Closing quote and end of string
  /xms
Leon Timmermans
Though you may want to omit the `\A` and/or `\z` -- they imply that there can be nothing preceding or trailing the double-quoted string.
j_random_hacker
+3  A: 

See Text::Balanced. It's better than reinvent wheel. Use gen_delimited_pat to see result pattern and learn form it.

Hynek -Pichi- Vychodil
A: 

RegExp::Common is another useful tool to be aware of. It contains regexps for many common cases, included quoted strings:

use Regexp::Common;

my $str = '" this is a \" quoted string"';
if ($str =~ $RE{quoted}) {
  # do something
}
Rob Van Dam