views:

405

answers:

5

Suppose I have a file containing lines I'm trying to match against:

foo
quux
bar

In my code, I have another array:

foo
baz
quux

Let's say we iterate through the file, calling each element $word, and the internal list we are checking against, @arr.

if( grep {$_ =~ m/^$word$/i} @arr)

This works correctly, but in the somewhat possible case where we have an test case of fo. in the file, the . operates as a wildcard operator in the regex, and fo. then matches foo, which is not acceptable.

This is of course because Perl is interpolating the variable into a regex.

The question:

How do I force Perl to use the variable literally?

+4  A: 

Quotemeta

Returns the value of EXPR with all non-"word" characters backslashed.

http://perldoc.perl.org/functions/quotemeta.html

Paul Nathan
+7  A: 

Use \Q...\E to escape special symbols directly in perl string after variable value interpolation:

if( grep {$_ =~ m/^\Q$word\E$/i} @arr)
Ivan Nevostruev
What if `$word = 'fo\E.'`?
Ryan Thompson
Then regexp will be something like "m/^fo\\E\.$/i". See `\Q` meta-symbol description on http://perldoc.perl.org/perlfaq6.html
Ivan Nevostruev
+7  A: 

The correct answer is - don't use regexps. I'm not saying regexps are bad, but using them for (what equals to) simple equality check is overkill.

Use: grep { lc($_) eq lc($word) } @arr and be happy.

depesz
Good point. The regex solution is a remnant of older and more complicated code.
Paul Nathan
+6  A: 

From perlfaq6's answer to How do I match a regular expression that's in a variable?:


We don't have to hard-code patterns into the match operator (or anything else that works with regular expressions). We can put the pattern in a variable for later use.

The match operator is a double quote context, so you can interpolate your variable just like a double quoted string. In this case, you read the regular expression as user input and store it in $regex. Once you have the pattern in $regex, you use that variable in the match operator.

chomp( my $regex = <STDIN> );

if( $string =~ m/$regex/ ) { ... }

Any regular expression special characters in $regex are still special, and the pattern still has to be valid or Perl will complain. For instance, in this pattern there is an unpaired parenthesis.

my $regex = "Unmatched ( paren";

"Two parens to bind them all" =~ m/$regex/;

When Perl compiles the regular expression, it treats the parenthesis as the start of a memory match. When it doesn't find the closing parenthesis, it complains:

Unmatched ( in regex; marked by <-- HERE in m/Unmatched ( <-- HERE  paren/ at script line 3.

You can get around this in several ways depending on our situation. First, if you don't want any of the characters in the string to be special, you can escape them with quotemeta before you use the string.

chomp( my $regex = <STDIN> );
$regex = quotemeta( $regex );

if( $string =~ m/$regex/ ) { ... }

You can also do this directly in the match operator using the \Q and \E sequences. The \Q tells Perl where to start escaping special characters, and the \E tells it where to stop (see perlop for more details).

chomp( my $regex = <STDIN> );

if( $string =~ m/\Q$regex\E/ ) { ... }

Alternately, you can use qr//, the regular expression quote operator (see perlop for more details). It quotes and perhaps compiles the pattern, and you can apply regular expression flags to the pattern.

chomp( my $input = <STDIN> );

my $regex = qr/$input/is;

$string =~ m/$regex/  # same as m/$input/is;

You might also want to trap any errors by wrapping an eval block around the whole thing.

chomp( my $input = <STDIN> );

eval {
    if( $string =~ m/\Q$input\E/ ) { ... }
    };
warn $@ if $@;

Or...

my $regex = eval { qr/$input/is };
if( defined $regex ) {
    $string =~ m/$regex/;
    }
else {
    warn $@;
    }
brian d foy
Is the eval in the second last code example to trap errors in "{...}" or could there be something wrong in "if ( $string =~ m/\Q$input\E/ )" too?
sid_com
The eval will catch all the errors in its block, but in terms of this question it's catching an error in the explicit code you see in the match operator.
brian d foy
+2  A: 

I don't think you want a regex in this case since you aren't matching a pattern. You're looking for a literal sequence of characters that you already know. Build a hash with the values to match and use that to filter @arr:

 open my $fh, '<', $filename or die "...";
 my %hash = map { chomp; lc($_), 1 } <$fh>;

 foreach my $item ( @arr ) 
      {
      next unless exists $hash{ lc($item) };
      print "I matched [$item]\n";
      }
brian d foy