views:

1289

answers:

6

I need a PHP solution to get rid of commas inside double quotes. I can't seem to figure out a solution using preg_replace.

I'm inserting data into a database from a text file that is coma delimited.

Certain columns from the text file contain multiple words that are surrounded in double quotes. Theses double quotes have comas inside, so when I attempt to insert the text into my database using coma delimited columns it reads in the quotes between the double quotes and turns them into columns.

Here is how a row in the text file looks:

partname,2035, "this is, desc,ription",qty, "another, description",

this is what happens when I attempt to separate it:

results partname, 2035, this is, desc, ription, qty, another, description, 

this is what I want:

partime, 2035, this is description, qty, another description,

As you can see, the parts surrounded in double quotes should not be split into separate columns. I don't know how to fix this; could someone point me in the right direction?

A: 

What you want is the str_getcsv() function. It's only available in PHP5.3.0+ though.

Matt
Hrrm. I have PHP version 5.2.5 and I don't think that upgrading to PHP 5.30 is an option. No other possible work arounds?
http://us2.php.net/fgetcsv has a csv_split function in the comments you can use. It does the same thing as the one posted below but the code that powers it is different. Try both, see which one works best for you.
Matt
cvs_split function does not automatically split the columns like I listed above.
A: 

Try this function to split a single line into raw values. It tries to cope with pretty irregular CSVs

function csv_split( $src, $comma = ',', $esc = '\\' ){
    $a = array();
    while( $src ){
     $c = $src{0};
     switch( $c ){
     // permit empty values
     case ',':
      $a[] = '';
      $src = substr( $src, 1 );
      continue 2;
     // ignore whitespace
     case ' ':
     case "\t":
      preg_match('/^\s+/', $src, $r );
      $src = substr( $src, strlen($r[0]) );
      continue 2;
     // quoted values
     case '"':
     case "'":
     case '`':
      $reg = sprintf('/^%1$s((?:\\\\.|[^%1$s\\\\])*)%1$s\s*(?:,|$)/', $c );
      break;
     // naked values
     default:
      $reg = '/^((?:\\\\.|[^,\\\\])*)(?:,|$)/';
      $c = ',';
     }
     if( preg_match( $reg, $src, $r ) ){
      $a[] = empty($r[1]) ? '' : str_replace( '\\'.$c, $c, $r[1] );
      $src = substr( $src, strlen($r[0]) );
      continue;
     }
     // else fail
     trigger_error("csv_split failure", E_USER_WARNING );
     break;
    }
    return $a;
}
Tim Whitlock
A: 

This may work for you:

$pieces = explode( '"', 'partname,2035, "this is, desc,ription",qty, "another, description",' );

// explode the ones we should split (will be even-# elements)
for ( $i=0; $i<= count($pieces); $i+=2 ) {
    $tmpArray  = explode(",", $pieces[$i]);
    foreach($tmpArray as $value) {
     if ( strlen(trim($value)) > 0 ) {
      $finalArray[] = $value;
     }
    }
}

// now add the ones we shouldn't split inside quotes (odd-# elements)
for ( $i=1; $i<= count($pieces); $i+=2 ) {
    if ( strlen(trim($pieces[$i])) > 0 ) { 
     $finalArray[] = $pieces[$i];
    }
}

// show the result
echo "<pre>";
print_r($finalArray);
echo "<pre>";
OneNerd
It does what I asked but I left out some information that I didn't think would be relevant. Here is a more complex row:"1-500-E1",29,1840,"09/06/09",3,,"T","F",51,0,0,1530.00,40,10,24,"TLER, 1 CAN, EXP 4PH",,0There should be 19 columns but it ends up being 15 with your code.
Are you looking to preserve the EMPTY elements? if so, just remove the trim() piece in each loop.
OneNerd
Well, trim is not the current problem, it's not reading in all the columns surrounded by quotes correctly. It completely skips the first column "1-500-e1" and a few others that have quotes.
hmm - when I run it, I get them all: '1-500-e1' is inside element 11, '09/06/09' in 12, 'T' in 13, 'F' in 14, and 'TLER, 1 CAN, EXP 4PH' in 15. Which ones are missing for you?
OneNerd
A: 

The general solution for the problem would be to escape the separator character while creating the line before assembling the coma separated data with a simple.

define(SEPARATOR,',');
$dataField = str_replace(SEPARATOR,'BSLASH'.SEPARATOR,$dataField);

Pls change BSLASH to backslash character (the comment input box is more intelligent than me as it behaves strangely when I try to write a backslash character and I can't figure out how to turn it off :) )

Of course it's not an option if you receive the data from 3rd party source.

Csaba Kétszeri
Well, I'm exporting the data from a very old database system. There is a few options like coma delimited, text in txt format and excel spreadsheet.
+3  A: 

Guys, PHP already has a function for this, fgetcsv (get comma sepparated values from files)

<?php
$r = array();
$fh = fopen('test.txt','r');
while($t = fgetcsv($fh)) {
    $r[] = $t;
}
var_dump($r);

Which is there since the PHP4 era.

Flavius
please read the original question CAREFULLY... :0
You don't need preg_replace() at all - it would only create needless overhead.
Flavius
A: 
function csv_string_to_array($str){
$expr="/,(?=(?:[^\"]*\"[^\"]*\")*(?![^\"]*\"))/";
$results=preg_split($expr,trim($str));
return preg_replace("/^\"(.*)\"$/","$1",$results);
}

That does the trick.

Paul Hachmang