views:

81

answers:

5

I'm looking for a way to obfuscate mailtos in the source code of a web site. I'd like to go from this:

href="mailto:[email protected]"

To this:

href="" onmouseover="this.href='mai'+'lto:'+'pre'+'sid'+'ent'+'@wh'+'ite'+'hou'+'se.'+'gov'"</code>

I'm probably going to go with a PHP solution instead, like this (that way I only have to globally replace the entire mailto, and the source on my end will look better), but I spent too much time looking at sed and Perl and now I can't stop thinking about how this could be done! Any ideas?

Update: Based heavily on eclark's solution, I eventually came up with this:

#!/usr/bin/env perl -pi
if (/href="mailto/i) {
    my $start = (length $`) +6;
    my $len = index($_,'"',$start)-$start;
    substr($_,$start,$len,'" onmouseover="this.href=' .
    join('+',map qq{'$_'}, substr($_,$start,$len) =~ /(.{1,3})/g));
}
+3  A: 
#!/usr/bin/perl

use strict; use warnings;

my $s = 'mailto:[email protected]';

my $obfuscated = join('+' => map qq{'$_'}, $s =~ /(.{1,3})/g );

print $obfuscated, "\n";

Output:

'mai'+'lto'+':pr'+'esi'+'den'+'t@w'+'hit'+'eho'+'use'+'.go'+'v'

Note that 'lto: is four characters, whereas it looks like you want three character groups.

Sinan Ünür
Thanks for that; I didn't notice!
Matthew Phipps
A: 

Just an example.

$ echo $s
href="mailto:[email protected]"

$ echo $s | sed 's|\(...\)|\1+|g' | sed 's/hre+f=\"/href="" onmouseover="this.href=/'
href="" onmouseover="this.href=+mai+lto+:pr+esi+den+t@w+hit+eho+use+.go+v"
ghostdog74
Not bad! If the string is actually divisible by three, will you have an extra plus at the end that you'll have to sub out with another sed? Not that I'm bugging you about it; it *is* just an example :)
Matthew Phipps
yes, it is just an example. I don't want to go into details as i want OP to do himself if he wants.
ghostdog74
A: 

Is this close enough?

use strict; 
use warnings; 

my $old = 'href="mailto:[email protected]"';
$old =~ s/href="(.*)"/$1/;
my $new = join '+', map { qq('$_') } grep { length $_ } split /(.{3})/, $old;
$new = qq(href=""\nonmouseover="this.href=$new\n");
print "$new\n";

__END__

href=""
onmouseover="this.href='mai'+'lto'+':pr'+'esi'+'den'+'t@w'+'hit'+'eho'+'use'+'.go'+'v'
"
toolic
Cool; thanks a lot for your response! This is the one I looked at first, in order to try and understand the others :) I don't think I want that extra newline though, and I like Sinan's sidestepping of that extra grep. I do have a question though; why does split require that grep for the behavior we want? Is it because technically the matched string is a delimiter for empty strings?
Matthew Phipps
You're welcome. The extra newline was used to try to match your output. I also like Sinan's solution better than mine. I kept mine here only because it showed how to handle the href (which Sinan's does not). You are correct: the ugly `grep`/`length` is needed to filter out the empty strings returned by `split`. `split` with parens preserves the delimiter.
toolic
A: 

Building on Sinan's idea, here's a short perl script that will process a file line by line.

#!/usr/bin/env perl -p

my $start = index($_,'href="') +6;
my $len = index($_,'"',$start)-$start;
substr($_,$start,$len+1,'" onmouseover="this.href=' .
  join('+',map qq{'$_'}, substr($_,$start,$len) =~ /(.{1,3})/g)
);

If you're going to use it, make sure you have your old files committed to source control and change the -p option to -i, which will rewrite a file in place.

eclark
That's great! Sorry I didn't say explicitly that I didn't actually know how to get to the string I wanted in perl either... but you read my mind and put the extra functionality down anyway :) I think you have an extra +1 in the first substr though. I also ended up adding case insensitivity. Oh, and I added mailto to the match string to avoid doing the same thing to all links, so the page will at least work and be crawlable without JS. Finally, just perl -i didn't seem to do anything; only -pi actually looped through the file...?
Matthew Phipps
A: 

Ack! Thppfft! I offer you this hairball:

s='href="mailto:[email protected]"'
echo "$s" | sed -n 's/=/=\x22\x22\n/;
h;
s/\n.*//;
x;
s/[^\n]*\n//;
s/"//g;
s/\(...\)/\x27&\x27+/g;
s/.*/onmouseover=\x22this.href=&\x22/;
x;
G;
s/\n//2;
s/+\([^\x22]\{1,2\}\)\x22$/+\x27\1\x27\x22/;
s/+\x22$/\x22/;
p'
Dennis Williamson
No kidding! I can see why perl was written, despite it still having a reputation for being nastier than Steve Dallas's feet...
Matthew Phipps
Oh wait... I just noticed the octets. Did you go out of your way to make this as obscure as possible? ;)
Matthew Phipps