tags:

views:

166

answers:

1

Most probably I'm missing something obvious here, but why do I need to call the search/replace regex twice to have any effect in the following code? If I call it only once, the replacement doesn't take place :-(

use strict;
use warnings;
use LWP::Simple;

my $youtubeCN = get(shift @ARGV);
die("Script tag not found!\n")
 unless $youtubeCN =~ /<script src="(.*?)">/;
my $youtubeScr = $1;
# WHY ???
$youtubeScr =~ s/&amp;/&/g;
$youtubeScr =~ s/&amp;/&/g;
my $gmodScr = get($youtubeScr);

$gmodScr =~ s/http:\/\/\?container/http:\/\/www.gmodules.com\/ig\/ifr\?/;
print "<script type=\"text/javascript\">$gmodScr</script>\n";

Update: I call this script like this:

perl bork_youtube_channel.pl 'http://www.youtube.com/user/pennsays'

If amp isn't properly transformed into &, I will get back an HTML page (probably an error page) rather than Javascript at step 2.

Update: It turns out that the URL was double encoded after all. Thank you all for your help!

+7  A: 

I suspect that if you look at the input data, it is doing the right thing - my guess is that in the middle of encoding and decoding, you're not seeing the real input and output. For example, try this:

use strict;
use warnings;

my $youtubeScr = "a&amp;b";

$youtubeScr =~ s/&amp;/&/g;
print $youtubeScr;
print "\n";

$youtubeScr =~ s/&amp;/&/g;
print $youtubeScr;
print "\n";

This prints

a&b
a&b

In other words, it's already worked to start with.

Are you sure your original text isn't foo&amp;amp;bar? That would give output of

foo&amp;bar
foo&bar

with the above code.

PS My perl-fu sucks. Apologies for any language abuses in the above code, but I think it should still be helpful :)

Jon Skeet
I agree that this is probably something to do with double encoding of the HTML entities. As there is not direct reason why this should be happening.
Xetius
I verified it and it is not an issue with double encoding. My only guess at this moment is that the fact that I'm getting the string from a capture (ie $1) is the issue - but I'm not sure how or why...
Cd-MaN
Well, try writing out the value before the first replacement, between the two replacements and after the second replacement - and post that information please.
Jon Skeet
Brad Gilbert