tags:

views:

662

answers:

4

Hi,

I have a .txt file with product data, which I want to read in php. Each line contains one product, and the product details (number, name and price) are separated by tabs. As you can see below, it is not always true that the prices are nicely aligned vertically, because of the difference in length for the prodcut names. The data look like this:

ABC001  an item description   $5.50
XYZ999  an other item    $6
PPP000  yet another one  $8.99
AKA010  one w a longer name   $3.33
J_B007  a very long name, to show tabs  $99

(I didn't know how to show the tabs, so they are spaces in the example above, but in the real file, it are real tabs)

What is the most efficient way to do this? (by the way, it is a remote file) I would love to have an array containing the product data per product:

$product['number'], $product['name'] and $product['price']

Thanks very much!

+3  A: 

You could read the file line by line (using the function file, for instance, that will get you each line into one line of an array).

And, then, use explode on each of those lines, to separate the fields :

$data_of_line = explode("\t", $string_line);

Using "\t" (tabulation") as a separator.

You'd then have $data_of_line[0] containing the number, $data_of_line[1] the name, and $data_of_line[2] the price.

Pascal MARTIN
+1  A: 
$fileArr = file('path.to.your.file.txt');

$productsData = array();

for ($i = 0; $i < count($fileArr); $i++) {
    $lineData = preg_match('/^(\w{3}\d{3})\s+(.*)\s+\$(\d+(\.\d+))$/', $fileArr[$i], $matches);
    $productsData[] = array(
        'number' => $matches[1],
        'name' => $matches[2],
        'price' => $matches[3]
    );
}

This will be slower the using explode but it can also parse files that have more then just one tab as a separator between values. Plus you won't have to strip $ sign from the prices. If you wan't to keep $ sign with the price you should use this regex instead:

'/^(\w{3}\d{3})\s+(.*)\s+(\$\d+(\.\d+))$/'
RaYell
does \s mean a tab in regex?the product code can also be something else, but that I can change.
Fortega
\s means any whitespace, so tabs, spaces, carrage returns and line feeds, however since we are matching one line at a time this will only match spaces and tabs. This is in case someone made a mistake and separate values with spaces instead of tabs, this script will make it possible to get the values from that as well, while `explode ` solution will crash.
RaYell
+1  A: 

1) Easiest way is using file() to load all the lines into an array (unless the file is really big, then i would consider another approach).

2) split each line by tab ("\t" character)

3) "format" the array columns as you wish.

Sample snippet:

$productsArray = file($productsFileName, FILE_IGNORE_NEW_LINES);

foreach ($productsArray as $key => &$product) {
    $arr = explode("\t", $product);
    $product = array('number' => $arr[0], 'name' => $arr[1], 'price' => $arr[2]);
}

var_dump($productsArray);
jcinacio
What do you consider to be really big? It contains about 2000 lines I think. What's the other approach you would consider?
Fortega
2000 lines sounds fine.For anything over a couple of megabytes i would not have the *whole* products array in memory, but instead iterate through the items - probably using fgets() to get single lines from the file.
jcinacio
+1  A: 

fgetcsv() is a good function

check out the example from http://us3.php.net/manual/en/function.fgetcsv.php, here's a slightly modified version:

$products = array();
$handle = fopen("products.txt", "r");

while (($data = fgetcsv($handle, 1000, "\t")) !== FALSE) {
    $products[] = array(
        'number' => $data[0],
        'name' => $data[1],
        'price' => $data[2]
    );
}
fclose($handle);
scott