tags:

views:

303

answers:

2

I need to process a file with shift_jis encoding. However the line terminators are in a format that im not familar with.

> file record.CSV 
record.CSV: Non-ISO extended-ASCII text, with CRLF, NEL line terminators

Im using the general:

open my $CSV_FILE, "<:encoding(shift_jis)", $filename or die "Could not open: $CSV_FILE : $!";
while (<$CSV_FILE>) {
    chomp;
    # do stuff
}

However it is still leaving a CR at the end of each record.

What is the correct way to terminate files of these types?

+1  A: 

Why not do $_ =~ s/\r// manually?

Edit: apparently, you can also do

require Encode;
use Unicode::Normalize;

s/\x{0085}//g;

to remove the NEL: Next Line, U+0085 characters.

Pedro Silva
A: 

You need to consider who's consuming the data and learn more about the environment which produced these files. If it's a plain-vanilla CSV output file you're after in the end, use any old string manipulation you like to get rid of them (and produce CRLF terminators in their stead) and you'll be fine.

fennec