views:

717

answers:

3

I would like to use a regular expression to mask all but the first three alphanumeric characters of each word in a string using a mask character (such as "x"), so "1 Buckingham Palace Road, London" would become "1 Bucxxxxxxx Palxxx Roax, Lonxxx".

Keeping the first three characters is easily done using

s/\b(\w{0,3})(.*)\b/$1/g

but I cannot seem to figure out how to insert length($2) times the masking character instead of $2.

Thanks!

+3  A: 

C#:

new Regex(@"(?<!\b.{0,2}).").Replace("1 Buckingham Palace Road, London", "x");

Since you say it's language-agnostic, I trust this can be easily ported into your language of choice...

Or, you could just get the length of $2 and fill the x's the old fashioned way.

Matthew Flaschen
Excellent solution. +1 for using negative lookbehind.
Artem Russakovskii
Looks nice. Apparently the problem is not as language-agnostic as I originally thought as Perl does not seem to implement variable length lookbehind (but C# does). I will try to figure out how to work around this later today.
tg
+1  A: 

Positive lookbehind, any word character with three word characters before it gets changed to an X:

s/(?<=\w{3})\w/$1x/g;

example perl script:

my $string = "1 Buckingham Palace Road, London"; 
$string =~ s/(?<=\w{3})\w/$1x/g; 
print qq($string\n);
Danalog
Better written as $string =~ s/((?<=\w{3})\w)/$1x/g to suppress warnings like "Use of uninitialized value $1 in concatenation (.) ..." for every peplacement.
fgm
`s/(?<=\w{3})\w/x/g` (note: no `$1`)
J.F. Sebastian
A: 
use warnings;
use strict;

my $string = "1 Buckingham Palace Road, London";

$string =~ s(
  \b(\w{0,3})(\w*)\b
){
  $1 . ( 'x' x length $2 )
}gex;

print $string, "\n";
Brad Gilbert