I already know how to convert the non-utf8-encoded content of a file line by line to UTF-8 encode, using something like the following code:
# outfile.txt is in GB-2312 encode
open my $filter,"<",'c:/outfile.txt';
while(<$filter>){
#convert each line of outfile.txt to UTF-8 encoding
$_ = Encode::decode("gb2312", $_);
...}
But I think Perl can directly encode the whole input file to UTF-8 format, so I've tried something like
#outfile.txt is in GB-2312 encode
open my $filter,"<:utf8",'c:/outfile.txt';
(Perl says something like "utf8 "\xD4" does not map to Unicode" )
and
open my $filter,"<",'c:/outfile.txt';
$filter = Encode::decode("gb2312", $filter);
(Perl says "readline() on unopened filehandle!)
They don't work. But is there some way to directly convert the input file to UTF-8 encode?
Update:
Looks like things are not as simple as I thought. I now can convert the input file to UTF-8 code in a roundabout way. I first open the input file and then encode the content of it to UTF-8 and then output to a new file and then open the new file for further processing. This is the code:
open my $filter,'<:encoding(gb2312)','c:/outfile.txt';
open my $filter_new, '+>:utf8', 'c:/outfile_new.txt';
print $filter_new $_ while <$filter>;
while (<$filter_new>){
...
}
But this is too much work and it is even more troublesome than simply encode the content of $filter line by line.