tags:

views:

235

answers:

2

Is it possible to do:

@foo = getPileOfStrings();

if($text =~ /@foo(*.?)@foo/)
{
 print "Sweet, you grabbed a $1! It lived between the foos!";
}

What's going on here is I need a $text =~ /($var1|$var2|$var3)(*.?)($var1.../; I don't know how many values there are, and I don't know the values until run-time.

Array interpolation into a set of ORs seemed to be the straightforward way to do this, but it doesn't seem to work right, and I'm getting into a twisty set of code...it's all alike!

+10  A: 

Use join and the qr// operator:

my $strings = join "|", getPileOfStrings();
my $re      = qr/$strings/; #compile the pattern

if ($text =~ /$re(*.?)$re/)

If you wish for the same word to delimit the stuff in the middle say:

if ($text =~/($re)(.*?)\1/)

If the strings could contain characters that are considered special by Perl regexes, you may want to use map and quotemeta to prevent them from being used by the regex:

my $strings = join "|", map quotemeta, getPileOfStrings();

And, as Michael Carman points out, if getPileOfStrings() is not designed to return the strings in the order you desire them to be matched in, you may want to use sort to force the longest match to be first in the alternation (items earlier in the alternation will match first in Perl 5):

my $strings = join "|" map quotemeta,
    sort { length $a <=> length $b } getPileOfStrings();

Remember to sort before running quotemeta since "a..." (length 4) will be transformed into "a\\.\\.\\." (length 6) which is longer than "aaaaaa" (length 5).

Chas. Owens
Ooo. That works! Spiffy!
Paul Nathan
The order of things within an alternation can be important. Unless `getPileOfStrings()` can be trusted to do it, you should sort the results by length with the longest values first.
Michael Carman
I was mentioning that on Sinan Ünür's answer at the same time you were mentioning it on mine (which was the same time he was adding it to his post), great minds indeed.
Chas. Owens
If you two aren't careful, you'll devolve into an infinite recursion.
Paul Nathan
Definitely do use the quotemeta, unless you specifically intend to treat the strings as regexes. Why trust? Quotemeta is cheap.And since performance was mentioned elsewhere: Perl 5.10 has a trie-izing stage in the regex engine that will optimize `(literal|literal|literal|...)` alternations. If you don't have 5.10 *and* you run into a performance problem you could look at `Regexp::Trie` or `Regexp::Assemble`, but don't do it needlessly.
hobbs
+4  A: 

You can use Regex::PreSuf:

my $re = presuf(getPileOfStrings());

Before going ahead with this, you might want to think about what you want the following code to do:

#!/usr/bin/perl

use strict;
use warnings;

my @pile = qw(ar a);
my $string = 'ar5a';

my $pile = join '|', @pile;
my $re = qr/$pile/;

my ($captured) = $string =~ /$re(.*?)$re/;

print "$captured\n";

If you want $captured to contain "r5", sort @pile by the lengths of its elements before joining as in

my $pile = join '|', sort { length $a <=> length $b } @pile;
Sinan Ünür
Regex::PreSuf will result in a more efficient regex than the straight `join`, but it also looks like it might screw up lists that have been ordered to match the longest (or shortest) alternation: "The original order of the words is not necessarily respected"
Chas. Owens
@Chas. It looks like I was editing my response as you posted this comment.
Sinan Ünür
great minds and all of that
Chas. Owens