views:

484

answers:

6

I am downloading a CSV file from another server as a data-feed from a vendor.

I am using curl to get the contents of the file and saving that into a variable called $contents.

I can get to that part just fine, but I tried exploding by (both "\r" and "\n") to get an array of each line but it tells me "the fatal error, memory allocation" so I echo strlen($contents) and its about 30.5 million chars. I need to manipulate the values and insert them into a data-base.

What do I need to do? to avoid memory allocation errors?

Thanks!

A: 

Spool it to a file. Don't try to hold all that data in memory at once.

Daniel Pryden
+4  A: 
  1. Increase memory_limit in php.ini.
  2. Read data using fopen() and fgets().
pingw33n
+3  A: 

You might want to consider saving it to a temporary file, and then reading it one lime at a time using fgets or fgetcsv.

This way you avoid the initial big array you get from exploding such a large string.

Sebastian P.
+9  A: 

PHP is choking because it's running out memory. Instead of having curl populate a PHP variable with the contents of the file, use the

CURLOPT_FILE

option to save the file to disk instead.

//pseudo, untested code to give you the idea

$fp = fopen('path/to/save/file', 'w');
curl_setopt($ch, CURLOPT_FILE, $fp);
curl_exec ($ch);
curl_close ($ch);
fclose($fp);

Then, once the file is saved, instead of using the file or file_get_contents functions (which would load the entire file into memory, killing PHP again), use fopen and fgets to read the file one line at a time.

Alan Storm
+8  A: 

Hi,

As other answers said :

  • you can't have all that in memory
  • a solution would be to use CURLOPT_FILE

But, you might not what to really create a file ; you could want to work with data in memory... Using it as soon as it "arrives".

One possible solution might be definind you own stream wrapper, and use this one, instead of a real file, with CURLOPT_FILE

First of all, see :


And now, let's go with an example.

First, let's create our stream wrapper class :

class MyStream {
    protected $buffer;

    function stream_open($path, $mode, $options, &$opened_path) {
        // Has to be declared, it seems...
        return true;
    }

    public function stream_write($data) {
        // Extract the lines ; on y tests, data was 8192 bytes long ; never more
        $lines = explode("\n", $data);

        // The buffer contains the end of the last line from previous time
        // => Is goes at the beginning of the first line we are getting this time
        $lines[0] = $this->buffer . $lines[0];

        // And the last line os only partial
        // => save it for next time, and remove it from the list this time
        $nb_lines = count($lines);
        $this->buffer = $lines[$nb_lines-1];
        unset($lines[$nb_lines-1]);

        // Here, do your work with the lines you have in the buffer
        var_dump($lines);
        echo '<hr />';

        return strlen($data);
    }
}

What I do is :

  • work on the chunks of data (I use var_dump, but you'd do your usual stuff instead) when they arrive
  • Note that you don't get "full lines" : the end of a line is a the beginning of a chunk, and the beginning of that same line was at the end of the previous chunk ; so, you have to keep some parts of a chunck between the calls to stream_write


Next, we register this stream wrapper, to be used with the pseudo-protocol "test" :

// Register the wrapper
stream_wrapper_register("test", "MyStream")
    or die("Failed to register protocol");


And, now, we do our curl request, like we would do when writting to a "real" file, like other answers suggested :

// Open the "file"
$fp = fopen("test://MyTestVariableInMemory", "r+");

// Configuration of curl
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, "http://www.rue89.com/");
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_BUFFERSIZE, 256);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_FILE, $fp);    // Data will be sent to our stream ;-)

curl_exec($ch);

curl_close($ch);

// Don't forget to close the "file" / stream
fclose($fp);

Note we don't work with a real file, but with our pseudo-protocol.


This way, each time a chunk of data arrives, MyStream::stream_write method will get called, and will be able to work on a small amount of data (when I tested, I always got 8192 bytes, whatever value I used for CURLOPT_BUFFERSIZE)


A few notes :

  • You need to test this more than I did, obviously
  • my stream_write implementation will probably not work if lines are longer than 8192 bytes ; up to you to patch it ;-)
  • It's only meant as a few pointers, and not a fully-working solution : you have to test (again), and probably code a bit more !

Still, I hope this helps ;-)
Have fun !

Pascal MARTIN
A: 

NB:

"Basically, if you open a file with fopen, fclose it and then unlink it, it works fine. But if between fopen and fclose, you give the file handle to cURL to do some writing into the file, then the unlink fails. Why this is happening is beyond me. I think it may be related to Bug #48676"

http://bugs.php.net/bug.php?id=49517

So be careful if you're on an older version of PHP. There is a simple fix on this page to double-close the file resource:

fclose($fp);
if (is_resource($fp))
    fclose($fp);
Mark White