views:

4978

answers:

6

I am trying to write a Perl script using the "utf8" pragma, and I'm getting unexpected results. I'm using Mac OS X 10.5 (Leopard), and I'm editing with TextMate. All of my settings for both my editor and operating system are defaulted to writing files in utf-8 format.

However, when I enter the following into a text file, save it as a ".pl", and execute it, I get the friendly "diamond with a question mark" in place of the non-ascii characters.

#!/usr/bin/env perl -w

use strict;
use utf8;

my $str = 'Çirçös';
print( "$str\n" );

Any idea what I'm doing wrong? I expect to get 'Çirçös' in the output, but I get '�ir��s' instead.

+28  A: 

use utf8; does not enable Unicode output - it enables you to type Unicode in your program. Add this to the program, before your print() statement:

binmode(STDOUT, ":utf8");

See if that helps. That should make STDOUT output in UTF-8 instead of ordinary ASCII.

Chris Lutz
Update: Tested (I'm on Leopard too - awesome!) and works.
Chris Lutz
I didn't know about this (I've only been putting UTF8 in a database, never printing it). +1.
Paul Tomblin
That worked, Chris. Thank you!
You're welcome. See also another correct answer: http://stackoverflow.com/questions/627661/writing-perl-code-in-utf8/627975#627975 and remember, TMTOWTDI. And @Paul - if you're writing UTF-8 to a file, you should probably use binmode() on that filehandle and make it "proper" UTF-8, but if it works..
Chris Lutz
...don't fix it, eh?
Chris Lutz
other ways: the open pragma ( http://search.cpan.org/perldoc/open ), the -C switch ( http://perldoc.perl.org/perlrun.html#-C )
ysth
I shy away from the -C switch because not all Perls (i.e. ActivePerl) can process command-line switches well (to my knowledge),
Chris Lutz
FWIW here is the reason: strings that contains only latin1 (ISO-8859-1) characters, despite being stored more or less in utf8, will be output as latin1 by default. This way scripts from a pre-unicode era still work the same, even with a unicode-aware perl.
mirod
The utf8 pragma does not let you write your source in UNICODE, it forces understand of your source in the UTF-8 (or UTF-EBCDIC) encoding of UNICODE, an important distinction.
Chas. Owens
A: 

Redirect the output to a text file and try that in an editor. If it displays fine there then your terminal's at fault.

Ant P.
No, the Leopard terminal has $LANG set to "en_US.UTF-8" by default. It's just that, by default (for backwards compatability - blek) Perl will output characters 128-255 as ? instead of Unicode, unless you specifically tell it not to.
Chris Lutz
A: 

do in your shell: $ env |grep LANG

This will probably show that your shell is not using a utf-8 locale.

Actually, it was set to utf-8. The problem was that I was outputting to STDOUT without setting binmode to utf-8;
This would be an orthogonal concern. You need your Perl script tooutput correct data before you can worry about how your terminalemulator interprets it.
jrockway
+14  A: 

You can use the open pragma.

For eg. below sets STDOUT, STDIN & STDERR to use UTF-8....

use open qw/:std :utf8/;

/I3az/

draegtun
Also good. I would +1 but I'm out of votes for today.
Chris Lutz
Well you can always give it to me later ;-)
draegtun
BTW... I gave u +1. I think binmode(STDOUT, ':utf8') is probably more correct in this situation. "use open" has other good uses but I can't seem to find how u can set it to just encode STDOUT only?
draegtun
@draegtun - Done, sir!
Chris Lutz
A: 

What output were you expecting? Did you want the output escaped? Like URI::Escape might do?

+5  A: 

TMTOWTDI, chose the method that best fits how you work. I use the environment method so I don't have to think about it.

In the environment:

export PERL_UNICODE=SDL

on the commandline:

perl -CSDL -le 'print "\x{1815}";

or with binmode:

binmode(STDOUT, ":utf8");          #treat as if it is UTF-8
binmode(STDIN, ":encoding(utf8)"); #actually check if it is UTF-8

or with PerlIO:

open my $fh, ">:utf8", $filename
    or die "could not open $filename: $!\n";

open my $fh, "<:encoding(utf-8)", $filename
    or die "could not open $filename: $!\n";

or with the open pragma:

use open ":encoding(utf8)";
use open IN => ":encoding(utf8)", OUT => ":utf8";
Chas. Owens