tags:

views:

664

answers:

4

Ok, I have a csv file like this:

14 ; 1234,56 ; 10203 ; "ABC" ; "DFG" ; "Lorem \n ipsum \n dolor sit" \n
15 ; 234,16 ; 10204 ; "ABC" ; "DFG" ; "Lorem \n ipsum \n dolor sit" \n
16 ; 1234,15 ; 10304 ; "CCC" ; "DFG" ; "Lorem ipsum/dolor \n sit amet\consec" \n

and so on...

The file has almost 550000 lines. How do I replace all \n characters inside double quotes at once?

I'm using PHP 5. Could it be done by preg_replace()?

A: 

So do you actually have the string '\n' (not a new line character) on some lines? If so, you just need to escape the new line character:

str_replace("\\n", "*foo*", $csv)

// this will make the following change:
14 ; 1234,56 ; 10203 ; "ABC" ; "DFG" ; "text \n text \n more text" \n
// that to this:
14 ; 1234,56 ; 10203 ; "ABC" ; "DFG" ; "text *foo* text *foo* more text" \n
nickf
A: 

PHP has a function specific to read CSV files: fgetcsv();

Havenard
A: 

I'm not too well versed in extremely complex regex's, so assuming you're looking for a one time conversion I would write a quick script to open the csv in php, read the file (fgetcsv built into php5) and write (fputcsv) line by line into a new file while str_replace'ing the newline characters.

(If I wasn't looking for the monster regex on stackoverflow, that is.)

Ryan
+1  A: 

I don't know if you're using fgetcsv(), but you can configure it to recognize individual fields including quoted information.

This way you can read your lines in one at a time and strip the new line characters at the field level rather than having to do an expensive RegEx operation on a large file all at once.

Slightly modified php code example from the documentation (replaced delimiter with ';'):

$row = 1;
$handle = fopen("data.txt", "r");
while (($data = fgetcsv($handle, 1000, ";")) !== FALSE) {
 $num = count($data);
 echo "<p> $num fields in line $row: <br /></p>\n";
 $row++;
 for ($c=0; $c < $num; $c++) {
  echo $data[$c] . "<br />\n";
 }
}
fclose($handle);

data.txt

14 ; 1234,56 ; 10203 ; "ABC" ; "DFG" ; "text 
 text 
 more text"
15 ; 234,16 ; 10204 ; "ABC" ; "DFG" ; "text 
 text 
 more text"

This will be recognized as 2 lines instead of 6 because fgetcsv() will recognize the new line characters in the quotes as part of the field and not additional lines of data.

Crazy Joe Malloy
fgetcsv() Works perfectly, thanks you all guys! I'm new to PHP, and didn't knew this function.Anyway, I'd like to see the regex pattern ...
Acacio Nerull
Glad to be of service, I've only been exposed to fgetcsv() recently myself - can't help you on the RegEx side of things, I can use them but I'm not sure how to make it work for that situation.
Crazy Joe Malloy