views:

284

answers:

7

I'm looking through perl code and I see this:

sub html_filter {
    my $text = shift;
    for ($text) {
        s/&/&/g;
        s/</&lt;/g;
        s/>/&gt;/g;
        s/"/&quot;/g;
    }
    return $text;
}

what does the for loop do in this case and why would you do it this way?

+2  A: 

It's just used to alias $text to $_, the default variable. Done because they're too lazy to use an explicit variable or don't want to waste precious cycles creating a new scalar.

Mark Canlas
+2  A: 

Its cleaning up &, < , > and quote characters and replacing them with the appropriate HTML entity chars.

Fortyrunner
+5  A: 

Without an explicit loop variable, the for loop uses the special variable called $_. The substitution statements inside the loop also use the special $_ variable because none other is specified, so this is just a trick to make the source code shorter. I would probably write this function as:

sub html_filter {
    my $text = shift;
    $text =~ s/&/&amp;/g;
    $text =~ s/</&lt;/g;
    $text =~ s/>/&gt;/g;
    $text =~ s/"/&quot;/g;
    return $text;
}

This will have no performance consequences and is readable by people other than Perl.

Greg Hewgill
You like boilerplate?
Svante
Are you implying that the original was not readable by people?
Mark Canlas
+8  A: 

The for loop aliases each element of the list its looping over to $_. In this case, there is only one element, $text.

Within the body, this allows one to write

s/&/&amp;/g;

etc. instead of having to write

$text =~ s/&/&amp;/g;

repeatedly. See also perldoc perlsyn.

Sinan Ünür
Greg Hewgill
Harold L
Copied it from Template::Toolkit. Learn something new everyday!
Timmy
@Greg OK, fixed.
Sinan Ünür
+1  A: 

It loops through your text and substitutes ampersands (&) with &amp, < with &lt, > with &gt and " with &quot. You'd do this for output to a .html document... those are the proper entity characters.

FreeMemory
+1  A: 

The original code could be more flexible by using wantarray to test the desired context:

sub html_filter {
    my @text = @_;
    for (@text) {
        s/&/&amp;/g;
        s/</&lt;/g;
        s/>/&gt;/g;
        s/"/&quot;/g;
    }
    return wantarray ? @text: "@text"; }

That way you could call it in list context or scalar context and get back the correct results, for example:

my @stuff = html_filter('"','>');
print "$_\n" for @stuff;

my $stuff = html_filter('&');
print $stuff;
For an interesting discussion of the evils/merits of wantarray() see http://www.perlmonks.org/?node_id=729965
daotoad
@daotoad... interesting is the right word. A thread full of opinions both for and against a variety of different issues dealing with context. In the end it all seems to add up to zero. But it is interesting.
+5  A: 

As Mr Hewgill points out, the code sample is implicitly localizing and aliasing to $_, the magical implied variable.

He offers a substitute that is more readable at the cost of boilerplate code.

There is no reason to sacrifice readability for brevity. Simply replace the implicit localization and assignment with an explicit version:

sub html_filter {
    local $_ = shift;

    s/&/&amp;/g;
    s/</&lt;/g;
    s/>/&gt;/g;
    s/"/&quot;/g;

    return $_;
}

If I didn't know Perl all that well and came across this code, I'd know that I needed to look at the docs for $_ and local--as a bonus in perlvar, there a few examples of localizing $_.

For anyone who uses Perl a lot, the above should be easy to understand.

So there is really no reason to sacrifice readability for brevity here.

daotoad
I like your observations, all except that it demands scalar context..... just kidding. "local $_ " in place of the "for($text)" block is not only a bit more succinet, it actually makes more sense to me.
In 5.10 you can write my $_
Alexandr Ciornii