I have been working on several Perl scripts that process large fixed-width data files, extracting small substrings out of each data record. I had imagined that delegating the extracting of substrings to method calls would be costly because of the overhead of copying the data record into the @_ array. So I ran the following to compare (a) direct call to substr(), (b) method call passing the data record as a string, and (c) method call passing the data record by reference.
use strict;
use warnings;
use Benchmark qw(timethese);
my $RECORD = '0' x 50000;
my $direct = sub { my $v = substr( $RECORD, $_, 1) for 0..999 };
my $byVal = sub { my $v = ByVal ( $RECORD, $_) for 0..999 };
my $byRef = sub { my $v = ByRef (\$RECORD, $_) for 0..999 };
sub ByVal { return substr( $_[0], $_[1], 1) }
sub ByRef { return substr(${$_[0]}, $_[1], 1) }
timethese( 10000, {
direct => $direct,
byVal => $byVal,
byRef => $byRef,
} );
my $byVal2loc = sub { my $v = ByVal2loc( $RECORD, $_) for 0..999 };
my $byRef2loc = sub { my $v = ByRef2loc(\$RECORD, $_) for 0..999 };
sub ByVal2loc { my $arg = shift; return substr( $arg, $_[0], 1) }
sub ByRef2loc { my $arg = shift; return substr( $$arg, $_[0], 1) }
timethese( $ARGV[0], {
byVal2loc => $byVal2loc,
byRef2loc => $byRef2loc,
} );
# Produces this output:
Benchmark: timing 10000 iterations of byRef, byVal, direct...
byRef: 19 wallclock secs...
byVal: 15 wallclock secs...
direct: 4 wallclock secs...
Benchmark: timing 10000 iterations of byRef2loc, byVal2loc...
byRef2loc: 21 wallclock secs...
byVal2loc: 119 wallclock secs...
As expected, the direct method was the fastest. However, I was surprised to find no penalty related to the "copying of data" that I had been imagining. Even when I increased the width of the record to outlandish proportions (for example, a billion characters), the by-value and by-reference benchmarks were basically the same.
It seems that when passing arguments to methods, Perl does not copy data. I guess this makes sense upon further reflection about the aliasing power of @_. The arguments are passed by reference, not by value.
However, it is a limited form of by-reference passing, because the references in @_ cannot be assigned directly to a local variable within the subroutine. Such assignments do result in data copying, as illustrated by the second set of benchmarks.
Am I understanding this correctly?