tags:

views:

839

answers:

1

I'm reading a .csv file that was created in Excel with the first line being column headings. One column heading contains an embedded newline. I want to ignore that newline but reading it line-by-line like:

while ( <IN> ) { 
    ...
    }

will treat it as a new line which will break my code (which I haven't written yet). My approach was to read the first line into an array of column headings and process the rest of the lines differently.

Is there maybe a regex I can use somewhere in the while that ignores the newline unless it's the last new line?

Or should I be approaching this differently?

+13  A: 

Use one of the Perl modules that handle CSV, such as Text::CSV_XS. Its documentation shows you how to handle embedded newlines. In general, you don't want to spend your time writing another CSV parser; get on with the more important parts of your task!

brian d foy
CSV parsing is surprisingly difficult, and for all but the most trivial code (i,e, as soon as you think things might break on `split ',', $line` you should be using CPAN. Text::CSV_XS and Text::XSV are the two you should consider. I use the former due to inertia, but the latter is newer and probably better for many uses.
singingfish
I think you meant that as your own answer rather than a comment to mine.
brian d foy
I was told this wouldn't work but I see there is a binary option that can bet set. I'll give it a try. Thanks for the edit.
RH
@singingfish: ITYM Text::xSV. And Text::CSV_XS was stagnant for many years, but in the last two years H.M.Brand took over maintenance and has done a lot of work (and put out a lot of releases).
ysth