views:

57

answers:

4

I am modifying some HTML pages and want to increase the font size dynamically with a regex. In my script below, I want the '8' and '3' to turn into '9' and '4' but I get '8++' and '3++', respectively. I have the following:

#!/usr/bin/perl
use warnings;
use LWP::Simple;

my $content = "<TD><FONT STYLE=\"font-family:Verdana, Geneva, sans-serif\" SIZE=\"8\">this is just a bunch of text</FONT></TD>";
$content .= "<TD><FONT STYLE=\"font-family:Verdana, Geneva, sans-serif\" SIZE=\"3\">more text</FONT></TD>";

$content=~s/SIZE="(\d+)">/SIZE="$1++">/g;

print $content;     
+5  A: 

I'll just skip the part about how regexps are a bad way to parse HTML, because sometimes a quick-and-dirty solution is good enough.

You can't use an operator inside a string like that. The ++ is just treated as plain text (as you found). You have to use the /e flag to indicate that the replacement should be evaluated as Perl code, and then use the appropriate expression, like:

$content =~ s/SIZE="(\d+)">/'SIZE="' . ($1 + 1) . '">'/eg;

You can't use $1++ for two reasons. First, it would do the increment after returning the value, so you'd be replacing 8 with 8 instead of 9. Second, $1 is a read-only value, and the increment would want to modify it.

cjm
thanks but I get "Modification of a read-only value attempted at test.pl line 10."
ginius
I forgot that you can't use `++` with `$1`. Fixed.
cjm
works fine now. thanks!
ginius
If you could use the increment operator (`++`), you would use it before for the variable (`++$1`), but `$1` is a special read-only variable, so it would produce an error.
vol7ron
+1  A: 

use the e switch to execute scripts inside the regex

knittl
You can also think of it as using the `e` switch to evaluate expressions inside the regex.
mobrule
+1  A: 
#!/usr/bin/perl -w    

use strict;    

   sub main{    
      my $c = qq{&lt;TD>&lt;FONT STYLE="font-family:Verdana, Geneva, sans-serif" SIZE="8">this is just a bunch of text&lt;/FONT>&lt;/TD>\n}
            . '&lt;TD>&lt;FONT STYLE="font-family:Verdana, Geneva, sans-serif" SIZE="3">more text&lt;/FONT>&lt;/TD>';

      $c =~ s/(SIZE=\")(\d+)(\")/$_=$2+1;"$1$_$3"/eg;

      print "$c\n";      
         #&lt;TD>&lt;FONT STYLE="font-family:Verdana, Geneva, sans-serif" SIZE="9">this is just a bunch of text&lt;/FONT>&lt;/TD>
         #&lt;TD>&lt;FONT STYLE="font-family:Verdana, Geneva, sans-serif" SIZE="4">more text&lt;/FONT>&lt;/TD>  
   }    

   main();    
Armando
+1  A: 

You should consider using an HTML parser such as HTML::TokeParser::Simple:

#!/usr/bin/perl

use strict; use warnings;

use HTML::TokeParser::Simple;

my $content = "<TD><FONT STYLE=\"font-family:Verdana, Geneva, sans-serif\" SIZE=\"8\">this is just a bunch of text</FONT></TD>";
$content .= "<TD><FONT STYLE=\"font-family:Verdana, Geneva, sans-serif\" SIZE=\"3\">more text</FONT></TD>";

my $parser = HTML::TokeParser::Simple->new( \$content );

while ( my $token = $parser->get_token ) {
    if ( $token->is_start_tag('font') ) {
        my $font_size = $token->get_attr('size');
        if ( defined $font_size ) {
            ++ $font_size;
            $token->set_attr(size => $font_size);
        }
    }
    print $token->rewrite_tag->as_is;
}

Output:

<td><font style="font-family:Verdana, Geneva, sans-serif" size="9">this is just
a bunch of text</font></td><td><font style="font-family:Verdana, Geneva, 
sans-serif" size="4">more text</font></td>
Sinan Ünür