ansaurus

Question

Perl Regex - Condensing groups of find/replace

Answer 1

+4 A:

Here's a technique that should work pretty well if all of your search items are fixed strings:

my %title_replacements = (
  ' PHD.' => ' PHD ',
  ' P H D ' => ' PHD  ',
  # ...,
);

my $titles_to_replace = join '|',
  map quotemeta, 
  keys %title_replacements;

$titles_to_replace = qr/$titles_to_replace/;

sub substitute_titles {
  my ($in) = @_;
  $$in =~ s/($titles_to_replace)/$title_replacements{$1}/g;
}

If you're running on a perl older than 5.10.0 or 5.8.9, you should consider using Regexp::Trie or Regexp::Assemble to build the regex, but on current perls the regex compiler will automatically trie-optimize any large list of alternations like that, so I left out the unnecessary complication.

hobbs 2010-06-07 21:41:24

Answer 2

+4 A:

Rather than running each substitution separately, create a closure that can do the work for you in a more efficient way:

sub make_translator {
    my %table = @_;
    my $regex = join '|' => map {quotemeta} keys %table;
    $regex = qr/$regex/;

    return sub {s/($regex)/$table{$1}/g}
}

my $translator = make_translator
    ' PHD.'   => ' PHD ',
    ' P H D ' => ' PHD   ',
    ' PROF.'  => ' PROF ';   # ... the rest of the pairs

my @list_of_strings = qw/.../;

$translator->() for @list_of_strings;

It is fastest to not pass anything and use $_ aliased to the array value (which the for loop does for you).

Eric Strom 2010-06-07 21:48:23

Answer 3

A:

I would most likely make a sub that created my patterns for me. This way all I would have to do is pass in an array of the titles I want normalized. Example:

sub make_pattern {
    my $list_ref = shift;
    my %patterns;
    for my $title ( @{$list_ref} ) {
        my $result = uc $title;
        my $pattern = '/' . join( '\s*', (//, $title)) . '\.*/i';
        $patterns{$pattern} = $result;
    }
return \%patterns;
}

my @titles = qw (PHD MD DR PROF ) #... plus whatever other titles you have
my $conversion_hash = make_pattern(\@titles);

Then you the resulting hash in conjunction with a closure as listed in some of the other answers here. I have not had time to test my code yet, but it should work.

stocherilac 2010-06-08 12:56:55

ansaurus

tags:

views:

answers:

Perl Regex - Condensing groups of find/replace

related questions