views:

53

answers:

3

My website allows users to upload a csv file with a list of books. The script then reads this file and checks the isbn number against Amazon, using the PEAR Services_Amazon class, returning enhanced book data. However, whenever I run the script on a list of books the amount of memory consumed steadily increases until I get a fatal error. At the moment, with 32 MB allocated, I can only read 370 records of the CSV file before it crashes.

I have a user with a 4500 record file to import and a virtual server with 256 MB of RAM, so increasing the memory limit is not a solution.

Here is a simplified version of the CSV import:

$handle = fopen($filename, "r");
 while (($data = fgetcsv($handle, 1000, ",")) !== FALSE) {
 $isbn = $data[6];
 checkIsbn($isbn);
 }

Here is a trimmed version of the function:

function checkIsbn($isbn) {     
 $amazon = &new Services_Amazon(ACCESS_KEY_ID, SECRET_KEY, ASSOC_ID);
 // -- $options array filled with $isbn, other requested info --
 $products = $amazon->ItemSearch('Books', $options);        
 // -- Then I create an array from the first result --
  $product = $products['Item'][0];
  $title = $product['ItemAttributes']['Title']; 
  // -- etc... various attributes are pulled from the $product array --
 mysql_query($sql); // -- put attributes into our DB
  unset($product); 
  unset($products);
  usleep(1800000); // maximum of 2000 calls to Amazon per hour as per their API
return $book_id;    
 }

What I've tried: unsetting the arrays as well as setting them to NULL, both in the function and in the CSV import code. I have increased all my timeouts to ensure that's not an issue. I installed xdebug and ran some tests, but all I found was that the script just kept increasing in memory each time the Amazon class is accessed (I'm no xdebug expert). I'm thinking that maybe the variables in the Services_Amazon class are not being cleared each time it's run, but have no idea where to go from here. I'd hoped unsetting the two arrays would do it, but no luck.

Edit: Update: I've decided that this may be a problem in the PEAR class (and looking at some of the questions here relating to PEAR, this does seem possible). Anyway, my OOP skills are very few at the moment, so I found a way to do this by reloading the page multiple times - see my answer below for details.

A: 

How about only creating a single instance of the $amazon object and passing it in to your checkIsbn function? They you wouldn't need to create 4500 instances.

$amazon = &new Services_Amazon(ACCESS_KEY_ID, SECRET_KEY, ASSOC_ID);
$handle = fopen($filename, "r");
while (($data = fgetcsv($handle, 1000, ",")) !== FALSE) {
 $isbn = $data[6];
 checkIsbn($amazon, $isbn);
}
unset($amazon);

Aaron D
That gives a fatal error "Object of class Services_Amazon could not be converted to string".
mandel
You need to change the method signature for checkIsbn to take another parameter. It should be: function checkIsbn($amazon, $isbn) {
Aaron D
I redid it and it didn't throw the error this time; must have had a typo as I did add the parameters. Alas, but this doesn't change the processing time / memory issue at all.
mandel
+3  A: 

first of all, this is not a memory leak but bad programming... second point is that unset won't free the used memory, it just removes the reference to the variable from the current scope.

also better try to not copy the memory here but just make $produkt and $title a pointer by assigning only the references to $products;

$product = &$products['Item'][0];
$title = &$product['ItemAttributes']['Title']; 

then, instead of only unset() do

$products = NULL;
unset($products);

this will free the memory, not immediately but when the php garbage collector runs the next time...

also why do you create a new instance iof the Serverces_Amazon each time the function i called? what about a class member to create instance in when constructing your object.

class myService
{
    protected $_service;

    public function __construct()
    {
        $this->_service = new Services_Amazon(ACCESS_KEY_ID, SECRET_KEY, ASSOC_ID);
    }

    public function checkIsbn($isbn)
    {
        //...
        $this->_service->ItemSearch('Books', $options);
        //...
    }
}

$myService = new myService;
$handle = fopen($filename, "r");
while (($data = fgetcsv($handle, 1000, ",")) !== FALSE) {

    $bookId = $myService->checkIsbn($data[6]);
}

and furthermore you assume that your users all use the same CSV format which is very unlikely... so better use a real CSV parser which can handle all possible CSV notations...

zolex
Bad programming is my middle name... Thanks for your thoughts. I knew that unset wasn't quite right here. I'll look at the class you've suggested. As for the CSV, for simplicity I stripped out other code that would have made this clearer - this part of the script is actually taking CSV from a known format source which does not change.
mandel
allright.. pelase tell us how much you can import now. you may have other memory-wasting loops etc...
zolex
I will once I can figure out how to set up this class. I'm very new to classes and a bit confused. How do I get the $products array from $this->_service->ItemSearch?
mandel
In checkIsbn, instead of: $products = $amazon->ItemSearch('Books', $options); use: $products = $this->_service->ItemSearch('Books', $options);
Aaron D
@mandel, you can simply use my code example as a base for your class. the commented lines is where you have to put your functionality. also the last lines of the example will work 1:1 for your posted code.
zolex
Thanks for your help with this. I did use your example as a base, and I also tried a couple of other ways of setting this up, and kept getting the memory error. I've looked into the PEAR class and there is a lot going on that is beyond me (it may be badly programmed, who knows?), so until I'm better educated about classes, I've decided to do this a different way. It works if I read it into a temporary table, then do a loop through it 300 records at a time, reloading the page until all the records are processed. A bit ugly, but it works.
mandel
ouch ^^ well then the real memory consumping part seems not to be in the sources you explosed.
zolex
A: 

I think you should also look into how you are connecting to the database - are you creating fresh connections each time checkIsbn is called? That could also be part of the problem.

kguest
There's one main mysql_connect/mysql_select_db at the start of the page; in the loop I am adding a record to the database only, so I don't believe this is a fresh connection, is it?
mandel