ansaurus

Question

Why Read In UTF-16LE File Won't Convert "\r\n" Into "\n" In Windows

Answer 1

A:

That is windows performing that magic for you.... If you specify UTF this is the equivalent of opening the file in binary mode vs text.

Newer versions of Perl have the \R which is a generic newline (ie, will match both \r\n and \n) as well as \v which will match all the OS and Unicode notions of vertical whitespace (ie, \r \n \r\n nonbreaking space, etc)

Does you regex logic allow using \R instead of \n?

drewk 2010-04-13 03:54:30

I just use $ as an anchor of the end of line

lz_prgmr 2010-04-13 05:29:58

Answer 2

+1 A:

What version of Perl are you using? UTF-16 and CRLF handling did not mix properly before 5.8.9 (Unicode changes in 5.8.9). I'm not sure about 5.10.0, but it works in 5.10.1 and 5.8.9. You might need to use "<:encoding(UTF-16LE):crlf" when opening the file.

cjm 2010-04-13 04:42:57

"<:encoding(UTF-16LE):crlf" doesn't work either, even with the 5.10.1 version

lz_prgmr 2010-04-13 05:14:36

@cjm appears broken in my testing on 5.10.1 as well (although admittedly I'm not on windows, I'm just faking it with `PERLIO=crlf` :)

hobbs 2010-04-13 05:19:32

`"<:encoding(UTF-16LE):crlf"` definitely works for me (on Linux) with both 5.8.9 and 5.10.1. I only have 5.8.8 on Windows, and that does not work.

cjm 2010-04-13 05:44:22

ansaurus

tags:

views:

answers:

Why Read In UTF-16LE File Won't Convert "\r\n" Into "\n" In Windows

related questions