ansaurus

Question

How can I substitute one substring for another in Perl?

Answer 1

A:

Build a hash of the pairs. Then split the target string into word tokens, and check each token against the keys in the hash. If it's present, replace it with the value of that key.

Amber 2010-09-22 16:36:49

Answer 2

+6 A:

#! /usr/bin/perl

use warnings;
use strict;

my %replace = (
  "foo" => "baz",
  "bar" => "quux",
);

my $to_replace = qr/@{["(" .
                       join("|" => map quotemeta($_), keys %replace) .
                       ")"]}/;

while (<DATA>) {
  s/$to_replace/$replace{$1}/g;
  print;
}

__DATA__
The food is under the bar in the barn.

The @{[...]} bit may look strange. It's a hack to interpolate generated content inside quote and quote-like operators. The result of the join goes inside the anonymous array-reference constructor [] and is immediately dereferenced thanks to @{}.

If all that seems too wonkish, it's the same as

my $search = join "|" => map quotemeta($_), keys %replace;
my $to_replace = qr/($search)/;

minus the temporary variable.

Note the use of quotemeta—thanks Ivan!—which escapes the first string of each pair so the regular-expression engine will treat them as literal strings.

Output:

The bazd is under the quux in the quuxn.

Metaprogramming—that is, writing a program that writes another program—is also nice. The beginning looks familiar:

#! /usr/bin/perl

use warnings;
use strict;

use File::Compare;

die "Usage: $0 path ..\n" unless @ARGV >= 1;

# stub
my @pairs = (
  ["foo"     => "baz"],
  ["bar"     => "quux"],
  ['foo$bar' => 'potrzebie\\'],
);

Now we generate the program that does all the s/// replacements—but is quotemeta on the replacement side a good idea?—

my $code =
  "sub { while (<>) { " .
  join(" " => map "s/" . quotemeta($_->[0]) .
                  "/"  . quotemeta($_->[1]) .
                  "/g;",
              @pairs) .
  "print; } }";
#print $code, "\n";

and compile it with eval:

my $replace = eval $code
  or die "$0: eval: $@\n";

To do the replacements, we use Perl's ready-made in-place editing:

# set up in-place editing
$^I = ".bak";
my @save_argv = @ARGV;

$replace->();

Below is an extra nicety that restores backups that the File::Compare module judges to have been unnecessary:

# in-place editing is conservative: it creates backups
# regardless of whether it modifies the file
foreach my $new (@save_argv) {
  my $old = $new . $^I;
  if (compare($new, $old) == 0) {
    rename $old => $new
      or warn "$0: rename $old => $new: $!\n";
  }
}

Greg Bacon 2010-09-22 17:27:49

You should also `quotemeta` keys before put them into regexp.

Ivan Nevostruev 2010-09-22 17:36:27

As I am a complete green in perl, can you explain what is done here? Specificly this line: @{["(" . join("|" => keys %replace) . ")"]}

Artium 2010-09-22 17:50:22

@Artium I got carried away. See updated answer.

Greg Bacon 2010-09-22 18:03:22

@Greg : That fat comma really hurts readability. Just a personal opinion...

Zaid 2010-09-22 18:11:50

Answer 3

+2 A:

There are two ways, both of them require you to compile a regex alternation on the keys of the table:

my %table = qw<The A the a quick slow lazy dynamic brown pink . !>;
my $alt 
    = join( '|'
          , map  { quotemeta } keys %table 
            sort { ( length $b <=> length $a ) || $a cmp $b } 
          )
    ;
my $keyword_regex = qr/($alt)/;

Then you can use this regex in a substitution:

my $text 
    = <<'END_TEXT';
The quick brown fox jumped over the lazy dog.  The quick brown fox jumped over the lazy dog. 
The quick brown fox jumped over the lazy dog.  The quick brown fox jumped over the lazy dog.  
END_TEXT

$text =~ s/$keyword_regex/$table{ $1 }/ge; # <- 'e' means execute code

Or you can do it in a loop:

use English qw<@LAST_MATCH_START @LAST_MATCH_END>;
while ( $text =~ /$keyword_regex/g ) { 
    my $key = $1;
    my $rep = $table{ $key };
    # use the 4-arg form
    substr( $text, $LAST_MATCH_START[1]
          , $LAST_MATCH_END[1] - $LAST_MATCH_START[1], $rep 
          );
    # reset the position to start + new actual
    pos( $text ) = $LAST_MATCH_START[1] + length $rep;
}

Axeman 2010-09-22 17:51:40

Answer 4

A:

If eval is not a security concern:

eval $(awk 'BEGIN { printf "sed \047"} {printf "%s", "s/\\<" $1 "\\>/" $2 "/g;"} END{print "\047 substtemplate"}' substwords )

This constructs a long sed command consisting of multiple substitution commands. It's subject to potentially exceeding your maximum command line length. It expects the word pair file to consist of two words separated by whitespace on each line. Substitutions will be made for whole words only (no clbuttic substitutions).

It may choke if the word pair file contains characters that are significant to sed.

You can do it this way if your sed insists on -e:

eval $(awk 'BEGIN { printf "sed"} {printf "%s", " -e \047s/\\<" $1 "\\>/" $2 "/g\047"} END{print " substtemplate"}' substwords)

Dennis Williamson 2010-09-22 22:42:09

ansaurus

tags:

views:

answers:

How can I substitute one substring for another in Perl?

related questions