views:

50

answers:

3

I have this large (and oddly formatted txt file) from the USDA's website. It is the NUT_DATA.txt file.

But the problem is that it is almost 27mb! I was successful in importing the a few other smaller files, but my method was using file_get_contents which it makes sense why an error would be thrown if I try to snag 27+ mb of RAM.

So how can I import this massive file to my MySQL DB without running into a timeout and RAM issue? I've tried just getting one line at a time from the file, but this ran into timeout issue.

Using PHP 5.2.0.

Here is the old script (the fields in the DB are just numbers because I could not figure out what number represented what nutrient, I found this data very poorly document. Sorry about the ugliness of the code):

<?

    $file = "NUT_DATA.txt";

    $data = split("\n", file_get_contents($file)); // split each line

    $link = mysql_connect("localhost", "username", "password");
    mysql_select_db("database", $link);

    for($i = 0, $e = sizeof($data); $i < $e; $i++)
    {
        $sql = "INSERT INTO `USDA` (1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17) VALUES(";
        $row = split("\^", trim($data[$i])); // split each line by carrot
        for ($j = 0, $k = sizeof($row); $j < $k; $j++) {
            $val = trim($row[$j], '~');
            $val = (empty($val)) ? 0 : $val;
            $sql .= ((empty($val)) ? 0 : $val) . ','; // this gets rid of those tildas and replaces empty strings with 0s
        }
        $sql = rtrim($sql, ',') . ");";
        mysql_query($sql) or die(mysql_error()); // query the db
    }

    echo "Finished inserting data into database.\n";

    mysql_close($link);

?>
A: 

You can increase the amount of memory each script can use by setting this value in php.ini:

memory_limit = 64M  

Having said this: do you have to use PHP? other scripting languages (like python) might be more appropriate for this kind of tasks.

Pablo Santa Cruz
PHP is fine for a task like this!
mmattax
+1  A: 

Read the file line by line so that you're not loading the entire file in memory. Use

set_time_limit(0);

To avoid having your script time out.

http://php.net/manual/en/function.set-time-limit.php

mmattax
+2  A: 

If you have to use PHP, you can read the file line by line using fopen and fgets

<?

$file = "NUT_DATA.txt";
$fh = @fopen( $file, "r" );    // open the file for reading
$link = mysql_connect("localhost", "username", "password");
mysql_select_db("database", $link);

while( !feof( $fh ) )
{
    $data = fgets( $fh, 4096 );     // read line from file

    $sql = "INSERT INTO `USDA` (1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17) VALUES(";
    $row = split("\^", trim($data)); // split each line by carrot
    for ($j = 0, $k = sizeof($row); $j < $k; $j++) {
        $val = trim($row[$j], '~');
        $val = (empty($val)) ? 0 : $val;
        $sql .= ((empty($val)) ? 0 : $val) . ','; // this gets rid of those tildas and replaces empty strings with 0s
    }
    $sql = rtrim($sql, ',') . ");";
    mysql_query($sql) or die(mysql_error()); // query the db
}

echo "Finished inserting data into database.\n";

fclose( $fh );

mysql_close($link);

?>

Check out the fgets documentation for more info

johnny_bgoode
Using this, set_time_limit, and waiting patiently I managed to get the entire file in a database. Now I just have to figure out what is what in this data : )
Tom