tags:

views:

212

answers:

3

I have a CSV file I'm importing but am running into an issue. The data is in the format:

TEST 690, "This is a test 1, 2 and 3" ,$14.95 ,4

I need to be able to explode by the , that are not within the quotes...

+8  A: 

See the fgetcsv function.

If you already have a string, you can create a stream that wraps it and then use fgetcsv. See http://code.google.com/p/phpstringstream/source/browse/trunk/stringstream.php

Artefacto
I'd rather user the regex cause there is special functionality here
Webnet
Don't do it with regex. It's not as simple as it looks. You may have line breaks in the strings. You may have escaped characters.
Artefacto
Once the CSV is parsed (by fgetscsv) you can regex-process each individual field to your heart's content.
Roadmaster
+1  A: 

If you really want to do this by hand, here's a rough reference implementation I wrote to explode a complete line of CSV text into an array. Be warned: This code does NOT handle multiple-line fields! With this implementation, the entire CSV row must exist on a single line with no line breaks!

<?php
//-----------------------------------------------------------------------
function csvexplode($str, $delim = ',', $qual = "\"")
// Explode a single CSV string (line) into an array.
{
    $len = strlen($str);  // Store the complete length of the string for easy reference.
    $inside = false;  // Maintain state when we're inside quoted elements.
    $lastWasDelim = false;  // Maintain state if we just started a new element.
    $word = '';  // Accumulator for current element.

    for($i = 0; $i < $len; ++$i)
    {
        // We're outside a quoted element, and the current char is a field delimiter.
        if(!$inside && $str[$i]==$delim)
        {
            $out[] = $word;
            $word = '';
            $lastWasDelim = true;
        } 

        // We're inside a quoted element, the current char is a qualifier, and the next char is a qualifier.
        elseif($inside && $str[$i]==$qual && ($i<$len && $str[$i+1]==$qual))
        {
            $word .= $qual;  // Add one qual into the element,
            ++$i; // Then skip ahead to the next non-qual char.
        }

        // The current char is a qualifier (so we're either entering or leaving a quoted element.)
        elseif ($str[$i] == $qual)
        {
            $inside = !$inside;
        }

        // We're outside a quoted element, the current char is whitespace and the 'last' char was a delimiter.
        elseif( !$inside && ($str[$i]==" ")  && $lastWasDelim)
        {
            // Just skip the char because it's leading whitespace in front of an element.
        }

        // Outside a quoted element, the current char is whitespace, the "next" char is a delimiter.
        elseif(!$inside && ($str[$i]==" ")  )
        {
            // Look ahead for the next non-whitespace char.
            $lookAhead = $i+1;
            while(($lookAhead < $len) && ($str[$lookAhead] == " ")) 
            {
                ++$lookAhead;
            }

            // If the next char is formatting, we're dealing with trailing whitespace.
            if($str[$lookAhead] == $delim || $str[$lookAhead] == $qual) 
            {
                $i = $lookAhead-1;  // Jump the pointer ahead to right before the delimiter or qualifier.
            }

            // Otherwise we're still in the middle of an element, so add the whitespace to the output.
            else
            {
                $word .= $str[$i];  
            }
        }

        // If all else fails, add the character to the current element.
        else
        {
            $word .= $str[$i];
            $lastWasDelim = false;
        }
    }

    $out[] = $word;
    return $out;
}


// Examples:

$csvInput = 'Name,Address,Phone
Alice,123 First Street,"555-555-5555"
Bob,"345 Second Place,   City  ST",666-666-6666
"Charlie ""Chuck"" Doe",   3rd Circle   ,"  777-777-7777"';

// explode() emulates file() in this context.
foreach(explode("\n", $csvInput) as $line)
{
    var_dump(csvexplode($line));
}
?>

I would still recommend relying on PHP's built-in function though. That's (hopefully) going to be far more reliable long term. Artefacto and Roadmaster are right.: anything you have to do to the data is best done after you import it.

beporter
A: 

worth a look dbTube.org

Timo Hellhagen