Before posting my question to the ActiveState forum, I'd like to try luck here :)
I'm trying to convert a simple script of mine to .exe file using Perlapp (version 8.1). The Perl script works fine and it seems Perlapp also did its job successfully. But the converted .exe file has some weird behavior, which, I believe, must be related to utf-8 encoding. For example, the Perl script would yield the result like:
hàn huáng zhòng sè sī qīng guó
But running the executable file would give me only this:
h hu zh s s q gu
I've already configured Perlapp so that utf8.pm is explicitly added but the problem just refuses to go away. I've tried something else. For example,
binmode DATA, ":utf8";
and
">:encoding(utf8)"
but without any luck;
Can anyone kindly give me some hint as to what might be the reason? Thanks like always :)
I can post the whole code here but it seems unnecessary so I just paste some snippets of the code that I think is relevant to the problem:
use utf8;
%zidian = map {chomp;split/\s+/,$_,2} <DATA>;
open my $in,'<:utf8',"./original.txt";
open my $out,'>:utf8',"./modified.txt";
if ( $code~~%zidian) {
$value = lc$zidian{$code};
}
__DATA__
3400 Qiū
3401 TIǎN
3404 KUà
3405 Wǔ
And one more thing, I'm running ActivePerl 5.10.0.on Windows XP (Chinese Version) and the script is saved as utf-8 encoding without BOM. PerlApp cannot handle a script that has BOM.
Edit
If I were to give a workable snippet, then I suppose it's like giving the whole code because I'm using three inter-connected sub-routines, which I take with some modifications from Lingua::han::Pinyin module and Lingua::han::Utils module.
#! perl
# to make good vertical alignment,
# set font family to SonTi and font size to Four(12pts)
use utf8;
sub Unihan {
my $hanzi = shift;
my @unihan = map { uc sprintf("%x",$_) } unpack ("U*", $hanzi);
}
sub csplit {
my $hanzi = shift;
my @return_hanzi;
my @code = Unihan($hanzi);
foreach my $code (@code) {
my $value = pack("U*", hex $code);
push @return_hanzi, $value if ($value);
}
return wantarray ? @return_hanzi : join( '', @return_hanzi );
}
%zidian = map {chomp;split/\s+/,$_,2} <DATA>;
sub han2pinyin {
my $hanzi = shift;
my @pinyin;
my @code = Unihan($hanzi);
foreach $code (@code) {
if ( $code~~%zidian) {
$value = lc$zidian{$code};
}
else {
$value = " ";
}
push @pinyin, $value;
}
return wantarray ? @pinyin : join( '', @pinyin );
}
open $in,'<:utf8',"./original.txt";
seek $in, 3,0;
open $out,'>:utf8',"./modified.txt";
while(<$in>){
s/(.{18})/$1\n/g;
push @tmp, $_;
}
foreach (@tmp){
my @hanzi;
my @pinyin;
@hanzi = csplit($_);
my $hang = join "", @hanzi;
@pinyin = han2pinyin($hang);
for ( my $i = 0; $i < @hanzi && $i < @pinyin; ++$i ) {
if ( $hanzi[$i] =~ /[\xEFBC8C]|[\xE38082]|[\xEFBC81]|[\xEFBC9F]|[\xE2809C]|[\xE2809D]|[\xEFBC9A]/ ) {
splice(@pinyin, $i, 0," ");
}
}
printf $out "%-7s" x @pinyin, @pinyin;
print $out "\n";
printf $out "%-6s" x @hanzi, @hanzi;
print $out "\n";
}
__DATA__
3400 Qiū
3401 TIǎN
3404 KUà
3405 Wǔ