I have this input text:
<html><head><meta http-equiv="content-type" content="text/html; charset=utf-8"></head><body><table cellspacing="0" cellpadding="0" border="0" align="center" width="603"> <tbody><tr> <td><table cellspacing="0" cellpadding="0" border="0" width="603"> <tbody><tr> <td width="314"><img height="61" width="330" src="/Elearning_Platform/dp_templates/dp-template-images/awards-title.jpg" alt="" /></td> <td width="273"><img height="61" width="273" src="/Elearning_Platform/dp_templates/dp-template-images/awards.jpg" alt="" /></td> </tr> </tbody></table></td> </tr> <tr> <td><table cellspacing="0" cellpadding="0" border="0" align="center" width="603"> <tbody><tr> <td colspan="3"><img height="45" width="603" src="/Elearning_Platform/dp_templates/dp-template-images/top-bar.gif" alt="" /></td> </tr> <tr> <td background="/Elearning_Platform/dp_templates/dp-template-images/left-bar-bg.gif" width="12"><img height="1" width="12" src="/Elearning_Platform/dp_templates/dp-template-images/left-bar-bg.gif" alt="" /></td> <td width="580"><p> what y all heard?</p><p>i'm shark oysters.</p> <p> </p> <p> </p> <p> </p> <p> </p> <p> </p> <p> </p></td> <td background="/Elearning_Platform/dp_templates/dp-template-images/right-bar-bg.gif" width="11"><img height="1" width="11" src="/Elearning_Platform/dp_templates/dp-template-images/right-bar-bg.gif" alt="" /></td> </tr> <tr> <td colspan="3"><img height="31" width="603" src="/Elearning_Platform/dp_templates/dp-template-images/bottom-bar.gif" alt="" /></td> </tr> </tbody></table></td> </tr> </tbody></table> <p> </p></body></html>
As you can see, there's no newline in this chunk of HTML text, and I need to look for all image links inside, copy them out to a directory, and change the line inside the text to something like ./images/file_name
.
Currently, the Perl code that I'm using looks like this:
my ($old_src,$new_src,$folder_name);
foreach my $record (@readfile) {
## so the if else case for the url replacement block below will be correct
$old_src = "";
$new_src = "";
if ($record =~ /\<img(.+)/){
if($1=~/src=\"((\w|_|\\|-|\/|\.|:)+)\"/){
$old_src = $1;
my @tmp = split(/\/Elearning/,$old_src);
$new_src = "/media/www/vprimary/Elearning".$tmp[-1];
push (@images, $new_src);
$folder_name = "images";
}## end if
}
elsif($record =~ /background=\"(.+\.jpg)/){
$old_src = $1;
my @tmp = split(/\/Elearning/,$old_src);
$new_src = "/media/www/vprimary/Elearning".$tmp[-1];
push (@images, $new_src);
$folder_name = "images";
}
elsif($record=~/\<iframe(.+)/){
if($1=~/src=\"((\w|_|\\|\?|=|-|\/|\.|:)+)\"/){
$old_src = $1;
my @tmp = split(/\/Elearning/,$old_src);
$new_src = "/media/www/vprimary/Elearning".$tmp[-1];
## remove the ?rand behind the html file name
if($new_src=~/\?rand/){
my ($fname,$rand) = split(/\?/,$new_src);
$new_src = $fname;
my ($fname,$rand) = split(/\?/,$old_src);
$old_src = $fname."\\?".$rand;
}
print "old_src::$old_src\n"; ##s7test
print "new_src::$new_src\n\n"; ##s7test
push (@iframes, $new_src);
$folder_name = "iframes";
}## end if
}## end if
my $new_record = $record;
if($old_src && $new_src){
$new_record =~ s/$old_src/$new_src/ ;
print "new_record:$new_record\n"; ##s7test
my @tmp = split(/\//,$new_src);
$new_record =~ s/$new_src/\.\\$folder_name\\$tmp[-1]/;
## print "new_record2:$new_record\n\n"; ##s7test
}## end if
print WRITEFILE $new_record;
} # foreach
This is only sufficient to handle HTML text with newlines in them. I thought only looping the regex statement, but then i would have to change the matching line to some other text.
Do you have any idea if there an elegant Perl way to do this? Or maybe I'm just too dumb to see the obvious way of doing it, plus I know putting global option doesn't work.
thanks. ~steve