views:

2307

answers:

4

I'm having a hard time determining what characters must be escaped when using Perl's qr{} construct

I'm attempting to create a multi-line precompiled regex for text that contains a myriad of normally escaped characters (#*.>:[]) and also contains another precompiled regex. Additionally I need to match as strictly as possible for testing purposes.

my $output = q{# using defaults found in .config
*
*
Options:
  1. opt1
> 2. opt2
choice[1-2?]: };

my $sc = qr{(>|\s)}smx;
my $re = qr{# using defaults found in .config
*
*
Options:
$sc 1. opt1
$sc 2. opt2
choice[1-2?]: }mx;

if ( $output =~ $re ) {
  print "OK!\n";
}
else {
  print "D'oh!\n";
}

Error:

Quantifier follows nothing in regex; marked by <-- HERE in m/# using defaults found in .config
* <-- HERE 
*
Options:
(?msx-i:(>|\s)) 1. opt1
(?msx-i:(>|\s)) 2. opt2
choice[1-2?]: / at ./so.pl line 14.

Attempting to escape the asterisks results in a failed match (D'oh output). Attempting to escape other pesky chars also results in a failed match. I could continue trying different combos of what to escape, but there's a lot of variations here and am hoping someone could provide some insight.

+12  A: 

You have to escape the delimiter for qr//, and you have to escape any regex metacharacters that you want to use as literals. If you want those to be literal *'s, you need to escape them since the * is a regex quantifier.

Your problem here is the various regex flags that you've added. The /m doesn't do anything because you don't use the beginning- or end-of-string anchors (^, $). The /s doesn't do anything because you don't use the wildcard . metacharacter. The /x makes all of the whitespace in your regex meaningless, and it turns that line with the # into a regex comment.

This is what you want, with regex flags removed and the proper things escaped:

my $sc = qr{(>|\s)};

my $re = qr{# using defaults found in \.config
\*
\*
Options:
$sc 1\. opt1
$sc 2\. opt2
choice\[1-2\?]: };

Although Damian Conway tells people in Perl Best Practices to always put these options on their regexes, you now see why he's wrong. You should only add them when you want what they do, and you should only add things when you know what they do. :) Here's what you might do if you want to use /x. You have to escape any literal whitespace, you need to denote the line endings somehow, and you have to escape the literal # character. What was readable before is now a mess:

my $sc  = qr{(>|\s)};
my $eol = qr{[\r\n]+};

my $re  = qr{\# \s+ using \s+ defaults \s+ found \s+ in \s+ \.config $eol
\*                    $eol
\*                    $eol
Options:              $eol
$sc \s+ 1\. \s+ opt1   $eol
$sc \s+ 2\. \s+ opt2   $eol
choice\[1-2\?]: \s+
}x;

if ( $output =~ $re ) {
  print "OK!\n";
}
else {
  print "D'oh!\n";
}
brian d foy
Argh! My understanding of what 's' and 'x' did was inverse of the reality. Hence the 's' missing from $re. But yes, I blame PbP here as well. :)
mnology
The book explains what the options do and why to use them... you can't really blame the book for this. :)
Brian Carper
I can blame the book. It says "Always use the /x flag" (p 236) and "Always use the /m flag" (p 237). The recommendation of "Always" is wrong.
brian d foy
Blame solely lies with me :). A quick edit to my .perlcriticrc should remedy this.
mnology
+5  A: 

Sounds like what you really want is Expect, but the thing you are most immediately looking for is the quotemeta operator which escapes all characters that have special meanings to a regex.

To answer your question directly (however), in addition to the unquote character (in this case }) you need to escape at a minimum, .[$()|*+?{\

geocar
Actually, this is being used in conjuction with Expect and Test::More. Just paring down the code for examples sake.
mnology
+2  A: 

Like brian said, you must escape the delimiter and regex metacharacters. Note that when using qr//x (which you are), you must also escape whitespace characters and # (which is a comment marker). You probably don't actually want to use /x here. If you want to be safe, you can escape any non-alphanumeric character.

cjm
A: 

any idea what does this function do ?

split(qr/(?<!:)\s{2,}/,$msg) ??

This response has nothing to do with this thread, as far as I can tell. I think you meant to ask a new question: http://stackoverflow.com/questions/ask
Alan Moore