I want to read UTF-8 input in Perl, no matter if it comes from the standard input or from a file, using the diamond operator: while(<>){...}
.
So my script should be callable in these two ways, as usual, giving the same output:
./script.pl utf8.txt
cat utf8.txt | ./script.pl
But the outputs differ! Only the second call (using cat
) seems to work as designed, reading UTF-8 properly. Here is the script:
#!/usr/bin/perl -w
binmode STDIN, ':utf8';
binmode STDOUT, ':utf8';
while(<>){
my @chars = split //, $_;
print "$_\n" foreach(@chars);
}
How can I make it read UTF-8 correctly in both cases? I would like to keep using the diamond operator <>
for reading, if possible.
EDIT:
I realized I should probably describe the different outputs. My input file contains this sequence: a\xCA\xA7b
. The method with cat
correctly outputs:
a
\xCA\xA7
b
But the other method gives me this:
a
\xC3\x8A
\xC2\xA7
b