tags:

views:

450

answers:

7

I have a string such as 'xxox-x' that I want to mask each line in a file against as such:

  • x's are ignored (or just set to a known value)
  • o's remain unchanged
  • the - is a variable length field that will keep everything else unchanged

therefore mask 'xxox-x' against 'deadbeef' would yield 'xxaxbeex'

the same mask 'xxox-x' against 'deadabbabeef' would yield 'xxaxabbabeex'

How can I do this succinctly preferrably using s operator?

+1  A: 
$ perl -pe 's/^..(.).(.+).$/xx$1x$2x/;'
deadbeef
xxaxbeex
deadabbabeef
xxaxabbabeex
caskey
thats close ... however, assume both the mask and the string are both variable ... perhaps a slightly different mask syntax could be used to provide something functionally similar to this
Reed Debaets
A: 

x can be translated to . and o to (.) whereas - becomes (.+?):

#!/usr/bin/perl

use strict; use warnings;

my %s = qw( deadbeef xxaxbeex deadabbabeef xxaxabbabeex);

for my $k ( keys %s ) {
    (my $x = $k) =~ s/^..(.).(.+?).\z/xx$1x$2x/;
    print +($x eq $s{$k} ? 'good' : 'bad'), "\n";
}
Sinan Ünür
Can you extend your code to handle arbitrary masks with an arbitrary number of `o`s?
daotoad
+7  A: 
$mask =~ s/-/'o' x (length $str - length $mask)/e;
$str =~ s/(.)/substr($mask, pos $str, 1) eq 'o' ? $1 : 'x'/eg;
ephemient
Excellent. This resolves the issues with keeping track of back references as you build your substitution string from the mask.
daotoad
excellent idea with replacing - with the appropriate number of o's
Reed Debaets
as a side note ... I think you need to initialize $pos to 0 and change substr($mask,pos,1) to substr($mask,$pos++,1)
Reed Debaets
Nope, `pos` is a Perl function describing where in the string the last regex match was. Of course, I kinda screwed up and should have written `pos($str)`, but it would have worked as written if it were `$_` :)
ephemient
nice ... thanks for the feedback
Reed Debaets
@ephemient, Great answer!
Mike
@ephemient, very nice, elegant and simple!
zen
Add one to the 'x' argument to account for the dash in the length calculation. $mask =~ s/-/'o' x (length($str) - length($mask) + 1)/e;
zen
Yeah, that's a bug that OP fixed in http://stackoverflow.com/questions/1871092/1871297#1871297
ephemient
@Reed That wouldn't hurt my feelings. I've edited my answer, so it should let you accept ephemient's elegant solution instead!
Greg Bacon
+1  A: 

Compile your pattern into a Perl sub:

sub compile {
  use feature 'switch';
  my($pattern) = @_;
  die "illegal pattern" unless $pattern =~ /^[-xo]+$/;

  my($search,$replace);
  my $i = 0;
  for (split //, $pattern) {
    given ($_) {
      when ("x") {
        $search  .= "."; $replace .= "x";
      }
      when ("o") {
        $search  .= "(?<sub$i>.)";
        $replace .= "\$+{sub$i}";
        ++$i;
      }
      when ("-") {
        $search  .= "(?<sub$i>.*)";
        $replace .= "\$+{sub$i}";
        ++$i;
      }
    }
  }

  my $code = q{
    sub {
      local($_) = @_;
      s/^SEARCH$/REPLACE/s;
      $_;
    }
  };
  $code =~ s/SEARCH/$search/;
  $code =~ s/REPLACE/$replace/;

  #print $code;
  local $@;
  my $sub = eval $code;
  die $@ if $@;

  $sub;
}

To be more concise, you could write

sub _patref { '$+{sub' . $_[0]++ . '}' }

sub compile {
  my($pattern) = @_;
  die "illegal pattern" unless $pattern =~ /^[-xo]+$/;

  my %gen = (
    'x' => sub { $_[1] .= '.';               $_[2] .= 'x' },
    'o' => sub { $_[1] .= "(?<sub$_[0]>.)";  $_[2] .= &_patref },
    '-' => sub { $_[1] .= "(?<sub$_[0]>.*)"; $_[2] .= &_patref },
  );

  my($i,$search,$replace) = (0,"","");
  $gen{$1}->($i,$search,$replace)
    while $pattern =~ /(.)/g;

  eval "sub { local(\$_) = \@_; s/\\A$search\\z/$replace/; \$_ }"
    or die $@;
}

Testing it:

use v5.10;

my $replace = compile "xxox-x";

my @tests = (
  [ deadbeef     => "xxaxbeex" ],
  [ deadabbabeef => "xxaxabbabeex" ],
);

for (@tests) {
  my($input,$expect) = @$_;
  my $got = $replace->($input);
  print "$input => $got : ", ($got eq $expect ? "PASS" : "FAIL"), "\n";
}

Output:

deadbeef => xxaxbeex : PASS
deadabbabeef => xxaxabbabeex : PASS

Note that you'll need Perl 5.10.x for given ... when.

Greg Bacon
this is the route I was just heading down ... thanks!
Reed Debaets
Disregard that other comment. I fail at life. I think a more elegant solution can be found, but that wasn't it.
Chris Lutz
Though I will note that `for(@a) { given($_) { } }` is redundant. Perlsyn (http://perldoc.perl.org/perlsyn.html#Switch-statements) specifically says that `for(@a) { when "x" { ... } }` is allowed.
Chris Lutz
A: 

heres a quick stab at a regex generator.. maybe somebody can refactor something pretty from it?

#!/usr/bin/perl

use strict;
use Test::Most qw( no_plan );

my $mask = 'xxox-x';

is( mask( $mask, 'deadbeef' ),     'xxaxbeex' );
is( mask( $mask, 'deadabbabeef' ), 'xxaxabbabeex' );

sub mask {
    my ($mask, $string) = @_;
    my $regex = $mask;
    my $capture_index = 1;

    my $mask_rules = {
     'x' => '.',
     'o' => '(.)',
     '-' => '(.+)',
    };

    $regex =~ s/$_/$mask_rules->{$_}/g for keys %$mask_rules;
    $mask  =~ s/$_/$mask_rules->{$_}/g for keys %$mask_rules;

    $mask  =~ s/\./x/g;
    $mask  =~ s/\([^)]+\)/'$' . $capture_index++/eg;

    eval  " \$string =~ s/^$regex\$/$mask/ ";

    $string;

}
zen
A: 
sub mask {
    local $_ = $_[0];
    my $mask = $_[1];
    $mask =~ s/-/'o' x (length($_)-(length($mask)-1))/e;
    s/(.)/substr($mask, pos, 1) eq 'o' && $1/eg;
    return $_;
}

Used tidbits from a couple answers ... this is what I ended up with.

EDIT: update from comments

Reed Debaets
ephemient
A: 

Here's a character by character solution using substr rather that split. It should be efficient for long strings since it skips processing the middle part of the string (when there is a dash).

sub apply_mask {
    my $mask = shift;
    my $string = shift;

    my ($head, $tail) = split /-/, $mask;

    for( 0 .. length($head) - 1 ) {
        my $m = substr $head, $_, 1;

        next if $m eq 'o';
        die "Bad char $m\n" if $m ne 'x';

        substr($string, $_, 1) = 'x';
    }

    return $string unless defined $tail;

    $tail = reverse $tail;
    my $last_char = length($string) - 1;

    for( 0 .. length($tail) - 1 ) {
        my $m = substr $tail, $_, 1;

        next if $m eq 'o';
        die "Bad char $m\n" if $m ne 'x';


        substr($string, $last_char - $_, 1) = 'x';

    }

    return $string;
}
daotoad