You have problems because you neglect to decode binary data to Perl strings during input and encode Perl strings to binary data during output. The reason for this is that regular expressions and its friend split
work properly on Perl strings.
(?<=.)
means "after the first character". As such, this program will not work correctly on 复姓/compound family names; keep in mind that they are rare, but do exist. In order to always correctly split a name into family name and given name parts, you need to use a dictionary with family names.
Linux version:
use strict;
use warnings;
use Encode qw(decode encode);
while (my $full_name = <DATA>) {
$full_name = decode('UTF-8', $full_name);
chomp $full_name;
my ($family_name, $given_name) = split(/(?<=.)/, $full_name, 2);
print encode('UTF-8',
sprintf('The full name is %s, the family name is %s, the given name is %s.', $full_name, $family_name, $given_name)
);
}
__DATA__
张小三
Output:
The full name is 张小三, the family name is 张, the given name is 小三.
Windows version:
use strict;
use warnings;
use Encode qw(decode encode);
use Encode::HanExtra qw();
while (my $full_name = <DATA>) {
$full_name = decode('GB18030', $full_name);
chomp $full_name;
my ($family_name, $given_name) = split(/(?<=.)/, $full_name, 2);
print encode('GB18030',
sprintf('The full name is %s, the family name is %s, the given name is %s.', $full_name, $family_name, $given_name)
);
}
__DATA__
张小三
Output:
The full name is 张小三, the family name is 张, the given name is 小三.