tags:

views:

53

answers:

5

So i'm writing a quick perl script that cleans up some HTML code and runs it through a html -> pdf program. I want to lose as little information as possible, so I'd like to extend my textareas to fit all the text that is currently in them. This means, in my case, setting the number of rows to a calculated value based on the value of the string inside the textbox.

This is currently the regex i'm using

$file=~s/<textarea rows="(.+?)"(.*?)>(.*?)<\/textarea>/<textarea rows="(?{ length($3)/80 })"$2>$3<\/textarea>/gis;

Unfortunately Perl doesn't seem to be recognizing what I was told was the syntax for embedding Perl code inside search-and-replace regexs Are there any Perl junkies out there willing to tell me what I'm doing wrong? Regards, Zach

A: 

I believe your problem is an unescaped /

If it's not the problem, it certainly is a problem.

Try this instead, note the \/80

$file=~s/<textarea rows="(.+?)"(.*?)>(.*?)<\/textarea>/<textarea rows="(?{ length($3)\/80 })"$2>$3<\/textarea>/gis;

The basic pattern for this code is:

$file =~ s/some_search/some_replace/gis;

The gis are options, which I'd have to look up. I think g = global, i = case insensitive, s = nothing comes to mind right now.

George Marian
A perl junkie, I am not. Though, I figured I'd work on it.
George Marian
+1  A: 

Must this be done with regex? Parsing any markup language (or even CSV) with regex is fraught with error. If you can, try to utilize a standard library:

http://search.cpan.org/dist/HTML-Parser/Parser.pm

Otherwise you risk the revenge of Cthulu:

http://www.codinghorror.com/blog/2009/11/parsing-html-the-cthulhu-way.html

(Yes, the article leaves room for some simple string-manipulation, so I think your soul is safe, though. :-)

eruciform
+2  A: 

The (?{...}) pattern is an experimental feature for executing code on the match side, but you want to execute code on the replacement side. Use the /e regular-expression switch for that:

#! /usr/bin/perl

use warnings;
use strict;

use POSIX qw/ ceil /;

while (<DATA>) {
  s[<textarea rows="(.+?)"(.*?)>(.*?)</textarea>] {
    my $rows = ceil(length($3) / 80);
    qq[<textarea rows="$rows"$2>$3</textarea>];
  }egis;
  print;
}

__DATA__
<textarea rows="123" bar="baz">howdy</textarea>

Output:

<textarea rows="1" bar="baz">howdy</textarea>
Greg Bacon
+1  A: 

The syntax you are using to embed code is only valid in the "match" portion of the substitution (the left hand side). To embed code in the right hand side (which is a normal Perl double quoted string), you can do this:

$file =~ s{<textarea rows="(.+?)"(.*?)>(.*?)</textarea>}
          {<textarea rows="@{[ length($3)/80 ]}"$2>$3</textarea>}gis;

This uses the Perl idiom of "some string @{[ embedded_perl_code() ]} more string".

But if you are working with a very complex statement, it may be easier to put the substitution into "eval" mode, where it treats the replacement string as Perl code:

$file =~ s{<textarea rows="(.+?)"(.*?)>(.*?)</textarea>}
          {'<textarea rows="' . (length($3)/80) . qq{"$2>$3</textarea>}}gise;

Note that in both examples the regex is structured as s{}{}. This not only eliminates the need to escape the slashes, but also allows you to spread the expression over multiple lines for readability.

Eric Strom
A: 

First, you need to quote the / inside the expression in the replacement text (otherwise perl will see a s/// operator followed by the number 80 and so on). Or you can use a different delimiter; for complex substitutions, matching brackets are a good idea.

Then you get to the main problem, which is that (?{...}) is only available in patterns. The replacement text is not a pattern, it's (almost) an ordinary string.

Instead, there is the e modifier to the s/// operator, which lets you write a replacement expression rather than replacement string.

$file =~ s(<textarea rows="(.+?)"(.*?)>(.*?)</textarea>)
          ("<textarea rows=\"" . (length($3)/80) . "\"$2>$3</textarea>")egis;
Gilles