The utf8 pragma and utf8 encodings on filehandles have me confused. For example, this apparently straightforward code...
use utf8;
print qq[fü];
To be clear, the hex dump on "fü" is 66 c3 bc
which if I'm not mistaken is proper UTF8.
That prints 66 fc
which is not UTF8 but Unicode or maybe Latin-1. Turn off use utf8
and I get 66 c3 bc
. This is the opposite of what I'd expect.
Now let's add in filehandle pramgas.
use utf8;
binmode *STDOUT, ':encoding(utf8)';
print qq[fü];
Now I get 66 c3 bc
. But remove use utf8
and I get 66 c3 83 c2 bc
which doesn't make any sense to me.
What's the right thing to do to make my code DWIM with UTF8?
PS My locale is set to "en_US.UTF-8" and Perl 5.10.1.