For example,
my $str = '中國c'; # Chinese language of china
I want to print out the numeric values
20013,22283,99
For example,
my $str = '中國c'; # Chinese language of china
I want to print out the numeric values
20013,22283,99
See perldoc -f ord:
foreach my $c (split(//, $str))
{
print ord($c), "\n";
}
Or compressed into a single line: my @chars = map { ord } split //, $str;
Data::Dumpered, this produces:
$VAR1 = [
20013,
22283,
99
];
To have utf8 in your source code recognized as such, you must use utf8;
beforehand:
$ perl
use utf8;
my $str = '中國c'; # Chinese language of china
foreach my $c (split(//, $str))
{
print ord($c), "\n";
}
__END__
20013
22283
99
or more tersely,
print join ',', map ord, split //, $str;
unpack
will be more efficient than split
and ord
, because it doesn't have to make a bunch of temporary 1-character strings:
use utf8;
my $str = '中國c'; # Chinese language of china
my @codepoints = unpack 'U*', $str;
print join(',', @codepoints) . "\n"; # prints 20013,22283,99
A quick benchmark shows it's about 3 times faster than split+ord
:
use utf8;
use Benchmark 'cmpthese';
my $str = '中國中國中國中國中國中國中國中國中國中國中國中國中國中國c';
cmpthese(0, {
'unpack' => sub { my @codepoints = unpack 'U*', $str; },
'split-map' => sub { my @codepoints = map { ord } split //, $str },
'split-for' => sub { my @cp; for my $c (split(//, $str)) { push @cp, ord($c) } },
'split-for2' => sub { my $cp; for my $c (split(//, $str)) { $cp = ord($c) } },
});
Results:
Rate split-map split-for split-for2 unpack
split-map 85423/s -- -7% -32% -67%
split-for 91950/s 8% -- -27% -64%
split-for2 125550/s 47% 37% -- -51%
unpack 256941/s 201% 179% 105% --
The difference is less pronounced with a shorter string, but unpack
is still more than twice as fast. (split-for2
is a bit faster than the other splits because it doesn't build a list of codepoints.)