views:

78

answers:

3

I have a function to convert documents into different formats, which then calls another function based on the type document. It's pretty straight forward for everything aside from HTML documents which require a bit of cleaning up, and that cleaning up is different based on where it's come from. So I had the idea that I could pass a reference to a subroutine to the convert function so the caller has the opportunity to modify the HTML, kinda like so (I'm not at work so this isn't copy-and-pasted):

package Converter;
...
sub convert
{
    my ($self, $filename, $coderef) = @_;

    if ($filename =~ /html?$/i) {
        $self->_convert_html($filename, $coderef);
    }
}

sub _convert_html
{
    my ($self, $filename, $coderef) = @_;

    my $html = $self->slurp($filename);
    $coderef->(\$html); #this modifies the html
    $self->save_to_file($filename, $html);
}

which is then called by:

Converter->new->convert("./whatever.html", sub { s/<html>/<xml>/i });

I've tried a couple of different things along these lines but I keep on getting 'Use of uninitialized value in substitution (s///)'. Is there any way of doing what I'm trying to do?

Thanks

+2  A: 

Try this:

Converter->new->convert("./whatever.html", sub { ${$_[0]} =~ s/<html>/<xml>/i; });

You are getting an uninitialized value warning because the substitution isn't being given anything to operate on ($_ is undefined in its scope). You need to tell it where to find its value (in @_, as a reference).

If you want to be fancy you could make the coderef operate on all its args by default:

sub { map { $$_ =~ s/<html>/<xml>/i } @_ }
Ether
For the map code example, I'm wondering what I'd want the return value to be. :)
brian d foy
@brian: indeed; normally I'd write a conversion sub to return the new value(s), rather than use references.
Ether
+3  A: 

Remember that an s/// by itself operates on $_, but your scalar reference is being passed into your callback sub as an argument, and is therefore in the @_ array.

So you can just change your callback sub to something like this:

sub { my ( $ref ) = @_; $$ref =~ s/<html>/<xml>/i }

Or, you could take advantage of the aliased nature of Perl subroutine arguments, and modify it directly:

sub _convert_html { 
    ...
    $coderef->( $html );
}

and then

sub { $_[0] =~ s/<html>/<xml>/i }

(This will actually modify the original string, as long as the argument is a scalar variable and not a literal string.)

friedo
+4  A: 

If it were me, I would avoid modifying the scalar ref and just return the changed value:

sub _convert_html
{
    my ($self, $filename, $coderef) = @_;

    my $html = $self->slurp($filename);
    $html = $coderef->( $html ); #this modifies the html
    $self->save_to_file($filename, $html);
}

However, if you want to modify a sub's arguments, it is worth knowing that all sub arguments are pass-by-reference in Perl (the elements of @_ are aliased to the arguments of the sub call). So your conversion sub can look like:

sub { $_[0] =~ s/<html>/<xml>/ }

But if you really want to operate on $_, like you have in your desired code example, you need to make _convert_html() look like:

sub _convert_html
{
    my ($self, $filename, $coderef) = @_;

    my $html = $self->slurp($filename);

    $coderef->() for $html;

    $self->save_to_file($filename, $html);
}

The for is an easy way to properly localize $_. You can also do:

sub _convert_html
{
    my ($self, $filename, $coderef) = @_;

    local $_ = $self->slurp($filename);

    $coderef->();

    $self->save_to_file($filename, $_);
}
daotoad
Thanks for the detailed explanation :-)
Mark
That's a lot of copying: first onto the argument stack, then change it, and copy it back onto the argument stack. That can hurt if you have to process many, many pages.
brian d foy
note that if speed is important, `local` is faster than `for` with one item
Eric Strom
note that if you notice the difference in speed between local and for, you are doing something else wrong.
jrockway
Also, Sub::AliasedUnderscore is a cleaner way than either local or for.
jrockway
@brian, yeah it can be a lot of copying--especially since it looks like `save_to_file()` also makes a copy. Even so, I generally prefer to work on copies when possible; "spooky action at a distance" bugs won't happen if you work only on copies. However, if profiling or expected usage suggests that we need to be more efficient, then I would definitely put the kibosh on the extra copying and use references to speed things up. My default position is to favor robustness in my APIs over absolute efficiency. There are many cases where efficiency gains are worth making some sacrifices.
daotoad
@jrockway, Nice module. Is there a reason why you localize the typeglob (`*_`) rather than just `$_`?
daotoad