views:

154

answers:

7

I've been running across a lot of Perl code that breaks long strings up this way:

my $string = "Hi, I am a very long and chatty string that just won't";
$string .= " quit.  I'm going to keep going, and going, and going,";
$string .= " kind of like the Energizer bunny.  What are you going to";
$string .= " do about it?";

From my background with Java, building a string like this would be a performance no-no. Is the same true with Perl? In my searches, I have read that using join on an array of strings is the fastest way to concatenate strings, but what about when you just want to break up a string for readability? Is it better to write:

my $string = "Hi, I am a very long and chatty string that just won't" .
    " quit.  I'm going to keep going, and going, and going," .
    " kind of like the Energizer bunny.  What are you going to" .
    " do about it?";

Or do I use join, or how should it be done?

+1  A: 

Use whichever one you like better; the performance of those is exactly the same in perl. Perl strings are not like Java strings, and can be modified in-place.

JSBangs
The performance of the two examples is the same, or the performance of `join` is the same as either example?I have a little bit of a hard time believing that `join` is the same (since `join` is generally a fully native function call), but if any language would have optimized the string concatenation operator, I'm sure it'd be perl...
Weston C
Concat is not as performant as join. See my answer for the reason why.
fatcat1111
The performance of both examples is the same. The performance of `join` is something else, which may be more and may be less. In any case, perl is not a high-performance language and the cost of string concatenation or calling `join` is unlikely to matter in the least.
JSBangs
While I agree that perl is not a high-performance language, that isn't what the OP was asking about. He asked which solution performed better. The difference may be at best second order, but I would rather give a direct answer to a direct question than second-guess the poster.
fatcat1111
It's hard to give up years of code reviews beating into my head, "Use `StringBuilder`" and such to read through code that in Java looks wrong. But it will hopefully help me write more Perl-ish Perl!
justkt
+11  A: 

Camel book, p 598:

Prefer join("", . ..) to a series of concatenated strings. Multiple concatenations may cause strings to be copied back and forth multiple times. The join operator avoids this.

fatcat1111
+3  A: 

The main performance difference between your two examples is that in the first, the concatenation happens each time the code is called, whereas in the second, the constant strings will be folded together by the compiler.

So if either of these examples will be in a loop or function called many times, the second example will be faster.

This assumes the strings are known at compile time. If you are building up the strings at runtime, as fatcat1111 mentions, the join operator will be faster than repeated concatenation.

Eric Strom
A: 

You don't need to do any of that stuff, you can easily just assign the whole string to a variable at once.

my $string = "Hi, I am a very long and  chatty string that just won't
 quit.   I'm going to keep going, and going,  and going,
 kind of like the Energizer  bunny.  What are you going to
 do  about it?"; 
dirk
That will include newlines in the string, which is probably not what the user wants.
Ether
@Ether - correct. Props for not calling me "he", btw. It happens all the time, for some reason, when I'm not male.
justkt
@justkt: I'm not either. *high5* :)
Ether
+10  A: 

One more thing to add to this thread that hasn't been mentioned yet -- if you can, avoid joining/concatenating these strings. Many methods will take a list of strings as arguments, not just one string, so you can just pass them individually, e.g.:

print "this is",
    " perfectly legal",
    " because print will happily",
    " take a list and send all the",
    " strings to the output stream\n";

die "this is also",
    " perfectly acceptable";

use Log::Log4perl :easy; use Data::Dumper;
INFO("and this is just fine",
    " as well");

INFO(sub {
    local $Data::Dumper::Maxdepth = 1;
    "also note that many libraries will",
    " accept subrefs, in which you",
    " can perform operations which",
    " return a list of strings...",
    Dumper($obj);
 });
Ether
This is very good to know.
justkt
+5  A: 

I made the benchmark ! :)

#!/usr/bin/perl

use warnings;
use strict;

use Benchmark qw(cmpthese timethese);

my $bench = timethese($ARGV[1], {

  multi_concat => sub {
    my $string = "Hi, I am a very long and chatty string that just won't";
    $string .= " quit.  I'm going to keep going, and going, and going,";
    $string .= " kind of like the Energizer bunny.  What are you going to";
    $string .= " do about it?";
  },

  one_concat => sub {
    my $string = "Hi, I am a very long and chatty string that just won't" .
    " quit.  I'm going to keep going, and going, and going," .
    " kind of like the Energizer bunny.  What are you going to" .
    " do about it?";
  },

  join => sub {
    my $string = join("", "Hi, I am a very long and chatty string that just won't",
    " quit.  I'm going to keep going, and going, and going,",
    " kind of like the Energizer bunny.  What are you going to",
    " do about it?"
    );
  },

} );

cmpthese $bench;

1;

The results (on my iMac with Perl 5.8.9):

imac:Benchmarks seb$ ./strings.pl 1000
Benchmark: running join, multi_concat, one_concat for at least 3 CPU seconds...
      join:  2 wallclock secs ( 3.13 usr +  0.01 sys =  3.14 CPU) @ 3235869.43/s (n=10160630)
multi_concat:  3 wallclock secs ( 3.20 usr + -0.01 sys =  3.19 CPU) @ 3094491.85/s (n=9871429)
one_concat:  2 wallclock secs ( 3.43 usr +  0.01 sys =  3.44 CPU) @ 12602343.60/s (n=43352062)
                   Rate multi_concat         join   one_concat
multi_concat  3094492/s           --          -4%         -75%
join          3235869/s           5%           --         -74%
one_concat   12602344/s         307%         289%           --
sebthebert
`one_concat` is optimized by the compiler into a constant assignment with 0 concatenations at runtime.
Eric Strom
@Eric. Thanks - that pretty much answers my original question.
justkt
+1  A: 

In my benchmarks, join is only marginally faster than concatenation with reassignment and only on short lists of strings. Concatenation without reassignment is significantly faster than either. On longer lists, join performs conspicuously worse than concatenation with reassignment, probably because argument passing starts to dominate execution time.

4 strings:
          Rate   .= join    .
.=   2538071/s   --  -4% -18%
join 2645503/s   4%   -- -15%
.    3105590/s  22%  17%   --
1_000 strings:
         Rate join   .=
join 152439/s   -- -40%
.=   253807/s  66%   --

So in terms of your question, . beats .= for execution time, though not by enough that it's generally worth worrying about. Readability is almost always more important than performance, and .= is often a more readable form.

This is in the general case; as sebthebert's answer demonstrates, . is so much faster than .= in the concatenation-of-constants case that I'd be tempted to treat that as a rule.

(The benchmarks, by the way, are basically in the obvious form and I'll prefer not to repeat the code here. The only surprising thing is creating the initial strings from <DATA> so as to foil constant folding.)

D'A

darch