ansaurus

Question

How can I open a Unicode file with Perl?

Answer 1

+1 A:

Try opening the file with an IO layer specified, e.g. :

open OUTPUT,  "<:encoding(UTF-8)", $file or die "Can't open $file: $!\n";

See perldoc open for more on this.

eugene y 2010-03-17 11:26:14

Answer 2

+7 A:

The file is presumably in UCS2-LE (or UTF-16 format).

C:\Temp> notepad test.txt

C:\Temp> xxd test.txt
0000000: fffe 5400 6800 6900 7300 2000 6900 7300  ..T.h.i.s. .i.s.
0000010: 2000 6100 2000 6600 6900 6c00 6500 2e00   .a. .f.i.l.e...

When opening such file for reading, you need to specify the encoding:

#!/usr/bin/perl

use strict; use warnings;

my ($infile) = @ARGV;

open my $in, '<:encoding(UCS-2le)', $infile
    or die "Cannot open '$infile': $!";

Note that the fffe at the beginning is the BOM.

Sinan Ünür 2010-03-17 12:21:50

That was exactly what I was looking for when asking for the dump. :)

JUST MY correct OPINION 2010-03-17 13:25:31

Thanks - it was actually UTF-16.

Jaco Pretorius 2010-03-17 14:45:04

UCS-2le is very, very close to UTF-16: http://en.wikipedia.org/wiki/UTF-16/UCS-2

Robert P 2010-03-17 15:53:43

Answer 3

+4 A:

The answer is in the documentation for open, which also points you to perluniintro. :)

brian d foy 2010-03-17 14:24:24

ansaurus

tags:

views:

answers:

How can I open a Unicode file with Perl?

related questions