tags:

views:

56

answers:

2

I'm trying to write to a Unicode (UCS-2 Little Endian) file in Perl on Windows, like this.

open my $f, ">$fName" or die "can't write $fName\n";
binmode $f, ':raw:encoding(UCS-2LE)';
print $f, "ohai\ni can haz unicodez?\nkthxbye\n";
close $f;

It basically works except I no longer get the automatic LF -> CR/LF translation on output that I get on regular text files. (The output files just have LF) If I leave out :raw or add :crlf in the "binmode" call, then the output file is garbled. I've tried re-ordering the "directives" (i.e. :encoding before :raw) and can't get it to work. The same problem exists for reading.

+1  A: 

The :crlf layer does a simple byte mapping of 0x0A -> 0x0D 0x0A (\n --> \r\n) in the output stream, but for the most part this isn't valid in any wide character encoding.

How about using a raw mode but explicitly print the CR?

print $f "ohai\r\ni can haz unicodez?\r\nkthxbye\r\n";

Or if portability is a concern, discover and explicitly use the correct line ending:

## never mind - $/ doesn't work
# print $f "ohai$/i can haz unicodez?$/kthxbye$/";

open DUMMY, '>', 'dummy'; print DUMMY "\n"; close DUMMY;
open DUMMY, '<:raw', 'dummy'; $EOL = <DUMMY>; close DUMMY;
unlink 'dummy';

...

print $f "ohai${EOL}i can haz unicodez?${EOL}kthxbye${EOL}";
mobrule
There's no way to make the :crlf layer work before the :encoding layer?
JoelFan
Not as far as I know. Maybe somebody else has an idea.
mobrule
+1  A: 

This works for me on windows:

open my $f, ">:encoding(UCS-2LE):crlf", "test.txt";
print $f "ohai\ni can haz unicodez?\nkthxbye\n";
close $f;

Yielding UCS-16 LE output in test.txt of

ohai
i can haz unicodez?
kthxbye
dsolimano
Really?! The 2nd line looks like Asian characters when I open that in Notepad+ (which has Unicode support). And when I open it in HexEdit, I can see why... the lines end in \x00 \x0D \x0A (instead of \x00 \x0D \x00 \x0A) which makes the second line "out of sync"
JoelFan
I see the line ending in \x00 \x0D \x00 \x0A. or as hexlify-buffer puts it, `0d00 0a00`. What version of perl are you using? I'm using the latest strawberry distribution.
dsolimano
Wow, this is weird... I am also using the latest Strawberry Perl (5.12), and I just tried it again to make sure... I am still seeing 00 0D 0A... Maybe you are using 5.10 and this is a bug introduced in 5.12?
JoelFan
`This is perl 5, version 12, subversion 0 (v5.12.0) built for MSWin32-x86-multi-thread`, not that this helps us much. I wonder if I have some sort of module installed that's affecting encoding and crlf. I'll try from another computer or two when I have access and post results.
dsolimano
Can others try this?
JoelFan
Maybe the OS matters? I am on Windows 7 64-bit (although I am using MSWin32-x86-multi-thread same as you)
JoelFan
Mysterious - I can repro your problem on another 32 bit XP machine with the same perl version. I am at a loss as to what is going on, but it doesn't look like this works in general.
dsolimano