ansaurus

Question

In Perl, what is the sane way for converting a string into a list of its characters?

Answer 1

+4 A:

You're right. The standard way to do it is split //, $string. To make code more readable you can create a simple function:

sub get_characters {
    my ($string) = @_;
    return ( split //, $string );
}

@characters = get_characters($string);

Ivan Nevostruev 2010-03-01 14:27:10

... and add comments inside the sub to describe the implementation.

toolic 2010-03-01 14:40:47

Answer 2

+5 A:

It doesn't get much clearer than using the split function to split a string. I suppose you could argue that the null pattern is unintuitive; though I find it clear enough. If you want a "clean" alternative wrap it in a sub:

my @characters = chars($string);
sub chars { split //, $_[0] }

Michael Carman 2010-03-01 14:27:40

You should shift it off too, right?

Mark Canlas 2010-03-01 15:30:49

You can if you think it's clearer but for such a small and simple function I generally don't bother. Note that the function doesn't change the value of `$_[0]`. If I were modifying the value I'd make a copy to avoid unexpected side effects for the caller.

Michael Carman 2010-03-01 16:04:00

Answer 3

+2 A:

Use split with a null pattern to break up the string into individual characters:

@characters = split //, $string;

If you just want the char codes, use unpack:

@values = unpack("C*", $string);

You may need to include use utf8 for unpack to work properly. And you can also use unpack + chr to split the string into individual characters, just TMTOWTDI:

@characters = map chr, unpack("C*", $string);

eugene y 2010-03-01 14:28:16

If your motto is "of every possible way to do it, pick the most unreadable one" this is a nice candidate. I'm not so bad at picking up new idiom, but pack/unpack somehow escape my grip.

reinierpost 2010-03-01 14:55:35

Question is, is this faster than a split?

DVK 2010-03-01 14:57:14

`@characters = unpack '(a)*', $string;`seems to work, too. Let's see what else we can dig up. :-)

hillu 2010-03-01 14:58:01

@hillu: If you want more obfuscated ways to do it... http://www.perlmonks.org/?node_id=54413

toolic 2010-03-01 15:02:34

I wasn't really looking for golfed / unreadable ways to do it, but almost all the answers seem to be headed in that general direction, so what the heck..

hillu 2010-03-01 15:05:39

pack and unpack may be a little cryptic, but they are very fast. For something pack or unpack can do directly, usually the only faster way to do it is in C.

Eric Strom 2010-03-01 16:19:38

Answer 4

+5 A:

For less readable and more concise (and still with regex overkill):

@characters = $string =~ /./g;

(I learned this idiom from playing code-golf.)

mobrule 2010-03-01 14:45:38

Uh. This is disgusting in an exciting way. +1 :-)

hillu 2010-03-01 14:54:35

Answer 5

+4 A:

I prefer using the split technique. It is well-known, and it is documented.

Yet another way...

@characters = $string =~ /./gs;

toolic 2010-03-01 14:47:55

+1 (See my comment to mobrule's post)

hillu 2010-03-01 14:55:08

Answer 6

+4 A:

Why would using a regular expression be "overkill"? Many worry that regexes in Perl are overkill because they think that running them involves a highly complex and slow regex algorithm. That's not always true: the implementation is highly optimized and many simple cases are treated specially: what looks like a regex may actually perform as well as a simple substring search. I wouldn't be surprised at all if this type of split is optimized as well. split is faster than your map in some tests I ran. unpack appears to be slightly faster than split.

I recommend split because it is the "idiomatic" way. You'll find it in perldoc, in many books, and any good Perl programmer should know it (if you are not sure your audience will understand it, you can always add a comment to the code like someone suggested.)

OTOH, if regexes are "overkill" only because the syntax is ugly, then it's too subjective for me to say anything. ;-)

itub 2010-03-01 15:19:24

Great answer.On overkill: I did not consider the run time at all. Regular expressions are integrated great into Perl, but when reading and trying to understand code, they often still require shifting one's mind. Which isn't much of a problem with that "idiomatic" expression using `split` and the empty match.

hillu 2010-03-01 20:41:05

Answer 7

+8 A:

Various examples, and speed comparisons.

I thought it might be a good idea to see how fast some of the ways are to split a string on every character.

I ran the test against several versions of Perl that I happen to have on my computer.

test.pl

use 5.010;
use Benchmark qw(:all) ;
my %bench = (
   'split' => sub{
     state $string = 'x' x 1000;
     my @chars = split //, $string;
     \@chars;
   },
   'split-string' => sub{
     state $string = 'x' x 1000;
     my @chars = split '', $string;
     \@chars;
   },
   'split-capture' => sub{
     state $string = 'x' x 1000;
     my @chars = split /(.)/, $string;
     \@chars;
   },
   'unpack' => sub{
     state $string = 'x' x 1000;
     my @chars = unpack( '(a)*', $string );
     \@chars;
   },
   'match' => sub{
     state $string = 'x' x 1000;
     my @chars = $string =~ /./gs;
     \@chars;
   },
   'match-capture' => sub{
     state $string = 'x' x 1000;
     my @chars = $string =~ /(.)/gs;
     \@chars;
   },
   'map-substr' => sub{
     state $string = 'x' x 1000;
     my @chars = map { substr $string, $_, 1 } 0 .. length($string) - 1;
     \@chars;
   },
);
# set the initial state of $string
$_->() for values %bench;
cmpthese( -10, \%bench );

for perl in /usr/bin/perl /opt/perl-5.10.1/bin/perl /opt/perl-5.11.2/bin/perl;
do
  $perl -v | perl -nlE'if( /(v5\.\d+\.\d+)/ ){
    say "## Perl $1";
    say "<pre>";
    last;
  }';
  $perl test.pl;
  echo -e '</pre>\n';
done

Perl v5.10.0

               Rate split-capture match-capture map-substr match unpack split split-string
split-capture 296/s            --          -20%       -20%  -23%   -58%  -63%         -63%
match-capture 368/s           24%            --        -0%   -4%   -48%  -54%         -54%
map-substr    370/s           25%            0%         --   -3%   -48%  -53%         -54%
match         382/s           29%            4%         3%    --   -46%  -52%         -52%
unpack        709/s          140%           93%        92%   86%     --  -11%         -11%
split         793/s          168%          115%       114%  107%    12%    --          -0%
split-string  795/s          169%          116%       115%  108%    12%    0%           --

Perl v5.10.1

               Rate split-capture map-substr match-capture match unpack split split-string
split-capture 301/s            --       -31%          -41%  -47%   -60%  -65%         -66%
map-substr    435/s           45%         --          -14%  -23%   -42%  -50%         -50%
match-capture 506/s           68%        16%            --  -10%   -32%  -42%         -42%
match         565/s           88%        30%           12%    --   -24%  -35%         -35%
unpack        743/s          147%        71%           47%   32%     --  -15%         -15%
split         869/s          189%       100%           72%   54%    17%    --          -1%
split-string  875/s          191%       101%           73%   55%    18%    1%           --

Perl v5.11.2

               Rate split-capture match-capture match map-substr unpack split-string split
split-capture 300/s            --          -28%  -32%       -38%   -59%         -63%  -63%
match-capture 420/s           40%            --   -5%       -13%   -42%         -48%  -49%
match         441/s           47%            5%    --        -9%   -39%         -46%  -46%
map-substr    482/s           60%           15%    9%         --   -34%         -41%  -41%
unpack        727/s          142%           73%   65%        51%     --         -10%  -11%
split-string  811/s          170%           93%   84%        68%    12%           --   -1%
split         816/s          171%           94%   85%        69%    12%           1%    --

As you can see split is the quickest, owing to the fact that this is a special case in the code for split.

split-capture is the slowest, probably because it has to set $1, along with several other match variables.

So I would recommend going with plain old split //, ..., or the roughly equivalent split '', ....

Brad Gilbert 2010-03-01 18:10:55

+1 good comparision. I'm surprised that `unpack '(a)*'` isn't faster. It would be good to see this with unicode strings as well.

Eric Strom 2010-03-01 19:01:11

ansaurus

tags:

views:

answers:

In Perl, what is the sane way for converting a string into a list of its characters?

Various examples, and speed comparisons.

test.pl

Perl v5.10.0

Perl v5.10.1

Perl v5.11.2

related questions