tags:

views:

77

answers:

3

Hello all,

I am trying to write a regex that will allow me to parse CSV files that excel creates. I have noticed when you export a CSV from excel, if the field is a string it will encase it in quotes. If that string contains quotes itself, it will escape each quote with a quote!!

What I want to do is split each line that I parse into fields. In light of the above, I have to split when there is a comma that is not within quotes. My regex is terrible, so how would I do this?

I can split by a comma, but how do I say when its not in between quotes??

$lines = file($toce_path);

foreach ($lines as $line) {

    $line_array = preg_split("/,/", $line);

    $test = "($line_array[0], $line_array[1], $line_array[2])";

    echo $test.'<br />';

} 

This question is exactly like mine but it doesn't work with preg_split. Preg_split requires Perl-compatible regular expression syntax.

Thanks all for any help

+4  A: 

Not exactly answering your question, but maybe solving your problem:

Have you tried fgetcsv() or str_getcsv()?

They're your best friends if you're dealing with CSV data.

timdev
+1  A: 

This expression works with .NET, which is supposed to be Perl compatible: (?<!\"\w*),

Input: some, "text, here" returns the match only on the comma after some.

AllenG
That would also split for `"multiple words, here"`, but `"wont"split,here`. There are ways to trick regex into finding tokens between quotes, but this isn't a good one, I'm afraid.
Kobi
+1  A: 

Why don't you use php's built-in function?

http://php.net/manual/en/function.fgetcsv.php

ghoppe