While the various map
/for
-based solutions will work, they'll also do regex processing of your string separately for each and every stopword. While this is no big deal in the example given, it can cause major performance issues as the target text and stopword list grow.
Jonathan Leffler and Robert P are on the right track with their suggestions of mashing all the stopwords together into a single regex, but a simple join
of all the stopwords into a single alternation is a crude approach and, again, becomes inefficient if the stopword list is long.
Enter Regexp::Assemble, which will build you a much 'smarter' regex to handle all the matches at once - I've used it to good effect with lists of up to 1700 or so words to be checked against:
#!/usr/bin/env perl
use strict;
use warnings;
use 5.010;
use Regexp::Assemble;
my @stopwords = qw( and the this that a an in to );
my $whole_text = <<EOT;
Fourscore and seven years ago our fathers brought forth
on this continent a new nation, conceived in liberty, and
dedicated to the proposition that all men are created equal.
EOT
my $ra = Regexp::Assemble->new(anchor_word_begin => 1, anchor_word_end => 1);
$ra->add(@stopwords);
say $ra->as_string;
say '---';
my $re = $ra->re;
$whole_text =~ s/$re//g;
say $whole_text;
Which outputs:
\b(?:t(?:h(?:at|is|e)|o)|a(?:nd?)?|in)\b
---
Fourscore seven years ago our fathers brought forth
on continent new nation, conceived liberty,
dedicated proposition all men are created equal.