ansaurus

Question

Is it faster to give Perl's print a list or a concatenated string?

Answer 1

+13 A:

Unless you are executing millions of these statements, the performance difference will not matter. I really suggest concentrating on performance problems where they do exist - and the only way to find that out is to profile your application.

Premature optimization is something that Joel and Jeff had a podcast on, and whined about, for years. It's just a waste of time to try to optimize something until you KNOW that it's slow.

Alex 2009-09-17 18:21:24

I am executing tens of thousands of them.

2009-09-17 18:23:27

Trivial amount. Is this a performance critical app?

Chris Simmons 2009-09-17 18:47:21

Answer 2

+3 A:

UPDATE: I just ran my own test.

1,000,000 iterations of each version took each < 1 second.

10mm iterations of each version took an average of 2.35 seconds for list version vs. 2.1 seconds for string concat version

DVK 2009-09-17 18:31:23

See my various benchmarking talks and maybe the benchmarking chapter in Mastering Perl to see why those numbers are meaningless.

brian d foy 2009-09-17 19:23:51

OK, i'll bite. What are these talks and where can I access them? :)[ I do own a copy of M.P. and will re-read the chapter. ]I dno't take the benchmarks as holy grail, but ceterus parabus, they could be of some use and statistics don't always lie :)

DVK 2009-09-17 19:48:45

@DVK: I would imagine he means here: http://www252.pair.com/comdog/

Telemachus 2009-09-17 20:59:52

@DVK: I don't know where all my talks are. I just google for them when I need them. Seriously, I google for my own stuff.

brian d foy 2009-09-17 22:16:46

Answer 3

+3 A:

Have you actually tried profiling this? Only takes a few seconds.

On my machine, it appears that B is faster. However, you should really have a look at Pareto Analysis. You've already wasted far, far more time thinking about this question then you'd ever save in any program run. For problems as trivial as this (character substitution!), you should wait to care until you actually have a problem.

Chris Simmons 2009-09-17 18:46:36

http://fetter.org/optimization.htmland your comment should be an 8th rule."8. If the time it takes you to think about optimization outweighs the gains you can possibly receive, optimization is over."

jsoverson 2009-09-17 19:24:17

@jsoverson In very rare circumstances, this is not true. Shaving a hair of a nanosecond may mean the difference between working and not working for things involving parts in the real world (think pacemaker). The amount of time saved over the lifespan of the program may never reach the amount time spent making it work, but if that hair of a nanosecond is necessary, it doesn't matter. That said, you should start looking for how to save that tiny slice of time, you should start with profiling, etc.

Chas. Owens 2009-09-17 19:36:20

@Chas: sure, but you don't use Perl in those cases because there's nothing that guarantees performance. However, you do not contradict what @jsoverson says; it's all about the benefit versus the cost. Working == benefit.

brian d foy 2009-09-17 19:46:00

Another example for Chas. Owens' comment, that may be more applicable to Perl, may be a programmer investing a week to optimize an application before it is being demoed by a lot of CEOs who are interested in purchasing it; even if the total time saved is only a couple of minutes (a few seconds per CEO.)

Inshallah 2009-09-17 20:34:19

@Chas If you're asking trivial micro optimization questions on Stack Overflow I hope to GOD you're not writing medical software.

Schwern 2009-09-18 05:49:07

@Schwern I chose medical device due to recent exposure to a hospital. Anything involving the real world may require that sort of speed: robotics for example.

Chas. Owens 2009-09-18 18:10:34

Answer 4

A:

Of the three options, I would probably choose string interpolation first and switch to commas for expressions that cannot be interpolated. This, humorously enough, means that my default choice is the slowest of the bunch, but given that they are all so close to each other in speed and that disk speed is probably going to be slower than anything else, I don't believe changing the method has any real performance benefits.

As others have said, write the code, then profile the code, then examine the algorithms and data structures you have chosen that are in the slow parts of the code, and, finally, look at the implementation of the algorithms and data structures. Anything else is foolish micro-optimizing that wastes more time than it saves.

You may also want to read perldoc perlperf

           Rate string concat  comma
string 803887/s     --    -0%    -7%
concat 803888/s     0%     --    -7%
comma  865570/s     8%     8%     --

#!/usr/bin/perl

use strict;
use warnings;

use Carp;
use List::Util qw/first/;
use Benchmark;

sub benchmark {
    my $subs = shift;

    my ($k, $sub) = each %$subs;
    my $value = $sub->();
    croak "bad" if first { $value ne $_->() and print "$value\n", $_->(), "\n" } values %$subs;

    Benchmark::cmpthese -1, $subs;
}

sub fake_print {
    #this is, plus writing output to the screen is what print does
    no warnings;
    my $output = join $,, @_;
    return $output;
}

my ($x, $y) = ("a", "b");
benchmark {
    comma  => sub { return fake_print $x, "|", $y, "\n"     },
    concat => sub { return fake_print $x .  "|" . $y . "\n" },
    string => sub { return fake_print "$x|$y\n"             },
};

Chas. Owens 2009-09-17 19:30:04

Your fake_print() might simulate the user level steps that the real print() goes through, but it does not do them in the same way and thus cannot be used for benchmarking the performance of print(). What you actually benchmarked is the difference between passing arguments as a list and concatenating them. There is also a small performance difference between calling a subroutine with one argument and many which further poisons the results. And, most importantly, by not calling print it misses the vital point that I/O swamps all other performance considerations.

Schwern 2009-09-18 06:08:22

Answer 5

+4 A:

Perl is a high-level language, and as such the statements you see in the source code don't map directly to what the computer is actually going to do. You might find that a particular implementation of perl makes one thing faster than the other, but that's no guarantee that another implementation might take away the advantage (although they try not to make things slower).

If you're worried about I/O speed, there are a lot more interesting and useful things to tweak before you start worrying about commas and periods. See, for instance, the discussion under Perl write speed mystery.

brian d foy 2009-09-17 19:31:02

Brian, I'm afraid I must disagree with the latter point - I don't think he was worried about IO speed per se, since the actual output to the IO device would be 100% identical. Although I completely agree with overall tenor of the idea that there are MUCH more impactful things to optimize in an average Perl program than this specific syntactic difference.

DVK 2009-09-17 19:54:01

@DVK: see his follow-up comment to his question. He wants to know which is faster.

brian d foy 2009-09-17 22:16:04

Answer 6

+12 A:

The answer is simple, it doesn't matter. As many folks have pointed out, this is not going to be your program's bottleneck. Optimizing this to even happen instantly is unlikely to have any effect on your performance. You must profile first, otherwise you are just guessing and wasting your time.

If we are going to waste time on it, let's at least do it right. Below is the code to do a realistic benchmark. It actually does the print and sends the benchmarking information to STDERR. You run it as perl benchmark.plx > /dev/null to keep the output from flooding your screen.

Here's 5 million iterations writing to STDOUT. By using both timethese() and cmpthese() we get all the benchmarking data.

$ perl ~/tmp/bench.plx 5000000 > /dev/null
Benchmark: timing 5000000 iterations of concat, list...
    concat:  3 wallclock secs ( 3.84 usr +  0.12 sys =  3.96 CPU) @ 1262626.26/s (n=5000000)
      list:  4 wallclock secs ( 3.57 usr +  0.12 sys =  3.69 CPU) @ 1355013.55/s (n=5000000)
            Rate concat   list
concat 1262626/s     --    -7%
list   1355014/s     7%     --

And here's 5 million writing to a temp file

$ perl ~/tmp/bench.plx 5000000
Benchmark: timing 5000000 iterations of concat, list...
    concat:  6 wallclock secs ( 3.94 usr +  1.05 sys =  4.99 CPU) @ 1002004.01/s (n=5000000)
      list:  7 wallclock secs ( 3.64 usr +  1.06 sys =  4.70 CPU) @ 1063829.79/s (n=5000000)
            Rate concat   list
concat 1002004/s     --    -6%
list   1063830/s     6%     --

Note the extra wallclock and sys time underscoring how what you're printing to matters as much as what you're printing.

The list version is about 5% faster (note this is counter to Pavel's logic underlining the futility of trying to just think this stuff through). You said you're doing tens of thousands of these? Let's see... 100k takes 146ms of wallclock time on my laptop (which has crappy I/O) so the best you can do here is to shave off about 7ms. Congratulations. If you spent even a minute thinking about this it will take you 40k iterations of that code before you've made up that time. This is not to mention the opportunity cost, in that minute you could have been optimizing something far more important.

Now, somebody's going to say "now that we know which way is faster we should write it the fast way and save that time in every program we write making the whole exercise worthwhile!" No. It will still add up to an insignificant portion of your program's run time, far less than the 5% you get measuring a single statement. Second, logic like that causes you to prioritize micro-optimizations over maintainability.

Oh, and its different in 5.8.8 as in 5.10.0.

$ perl5.8.8 ~/tmp/bench.plx 5000000 > /dev/null
Benchmark: timing 5000000 iterations of concat, list...
    concat:  3 wallclock secs ( 3.69 usr +  0.04 sys =  3.73 CPU) @ 1340482.57/s (n=5000000)
      list:  5 wallclock secs ( 3.97 usr +  0.06 sys =  4.03 CPU) @ 1240694.79/s (n=5000000)
            Rate   list concat
list   1240695/s     --    -7%
concat 1340483/s     8%     --

It might even change depending on what Perl I/O layer you're using and operating system. So the whole exercise is futile.

Micro-optimization is a fool's game. Always profile first and look to optimizing your algorithm. Devel::NYTProf is an excellent profiler.

#!/usr/bin/perl -w

use strict;
use warnings;
use Benchmark qw(timethese cmpthese);

#open my $fh, ">", "/tmp/test.out" or die $!;
#open my $fh, ">", "/dev/null" or die $!;
my $fh = *STDOUT;
my $hash = {
    foo => "something and stuff",
    bar => "and some other stuff"
};

select *STDERR;
my $r = timethese(shift || -3, {
    list => sub {
        print $fh $hash->{foo}, "|", $hash->{bar};
    },
    concat => sub {
        print $fh $hash->{foo}. "|". $hash->{bar};
    },
});
cmpthese($r);

Schwern 2009-09-18 06:02:02

My system keeps telling that list is slower, even when I run your benchmark. This also shows that you can't choose one of these variants: while on some system the former is faster, on others the latter is.

Pavel Shved 2009-09-19 19:16:55

Please, stop wasting on putting up experiments like this. They've been done millions of times. The performance difference doesn't matter.

Alex 2009-10-01 21:55:53

ansaurus

tags:

views:

answers:

Is it faster to give Perl's print a list or a concatenated string?

related questions