views:

119

answers:

2

I'm trying to read a CSV file generated by M$ Excel on linux.

The file has quoted multi-line (x0A separated) columns and a 0x0d0a line termination.

PHP on Linux uses 0x0a as line terminator, so all the line-based tools (file, fgets, fgetcsv) thinks there are record breaks in the middle of the data cells.

Short of processing the file byte by byte, can I temporarily change PHP's end of line character (PHP_EOL constant) so I can easily parse the file.

I think it can be done in perl with "$\". Is there something similar in PHP?

I realize I can parse byte by byte, but I'm looking for a cleaner approach.

+1  A: 

You might try using the 'auto_detect_line_endings' run-time configuration option. It says that using this will automatically figure out the correct line endings. From the docs:

When turned on, PHP will examine the data read by fgets() and file() to see if it is using Unix, MS-Dos or Macintosh line-ending conventions.

This enables PHP to interoperate with Macintosh systems, but defaults to Off, as there is a very small performance penalty when detecting the EOL conventions for the first line, and also because people using carriage-returns as item separators under Unix systems would experience non-backwards-compatible behaviour.

If that doesn't work then you could always read the entire file into memory (depending on the file size this might not be feasible) and do a preg_replace on the characters in question, replacing them for the "correct" characters.

conceptDawg
My suspicions were correct. auto_detect_line_endings didn't work in this case. Trying some of the others.
rmeden
+1  A: 

If conceptDawg's suggestion of auto_detect_line_endings doesn't work, I would recommending reading in the entire file via file_get_contents() and then calling explode() to break up the file into multiple lines. You can pass whatever character you want to explode()

Josh
thanks.. explode worked great... ( I really need to read up on my function list... I didn't know about that one )
rmeden