views:

42

answers:

2

Hi,

I'm busy with a project in cakePHP where I need to parse a couple of XML files and insert the relevant data in my mysql database. The script inserts what it should insert, that's not the problem. For example if I parse one or 2 files (approx 7000-8000 records), nothing goes wrong.

Problems start when I parse the third or fourth xml file. After a minute inserting records I see there are 9000-10000 records succesfully inserted in the database, but suddenly it seems the script restarts itself. I notice 0 records are present in the table and it restarts inserting all the records. So the script is just taking ages to execute.

Short snippet:

$content = simplexml_load_file($file);

/**
 * Process line per line 
 */              
foreach ($content->product as $line) {              
  // create new record in products database table
  $product = array();
  $product['Product']['productid'] = $line->attributes()->sku_number;
  $product['Product']['name'] = $line->attributes()->name;
  $product['Product']['description'] = empty($line->description->long) ? $line->description->short : $line->description->long;
  $product['Product']['link'] =  $line->URL->product;
  $product['Product']['affiliate'] = 'linkshare';
  $product['Product']['price'] = $line->price->retail;
  $product['Product']['brand'] = strtolower($line->brand);
  $product['Product']['image'] = $line->URL->productImage;

  // if not in rejectedproducts, save the new product to the database
  if (!$rejectedproductModel->findByProductid($product['Product']['productid'])) {
  $productModel->create();
  $productModel->save($product);        
}

Somebody got experience with this? What could be the cause and more what could be a solution :)

Thanks

A: 

I'll show some of the code. The call of the feeds happens like this The parseDirectory method checks all the xmls in the specified folder and parses them by calling the linkshare action and passing the filename.

function index() {
        set_time_limit(0);

        #$this->updateFeeds();

        App::import('Model', 'Product');        
        $productModel = new Product();      
        # truncate table products before adding new records to avoid duplicate records
        $productModel->query('TRUNCATE TABLE products');

        # parse all files from shareasale
        #$this->__parsedirectory('feeds/shareasale');   
        # parse all files from linkshare
        $this->__parsedirectory('feeds/linkshare'); 

        # send mails where necessary
        $this->redirect(array('controller' => 'subscriptions', 'action' => 'sendmails'));
    }

Private functions

function __parsedirectory($dir) {   
    # retrieve name affiliate out of directory
    $affiliate = explode('/', $dir);
    $affiliate = $affiliate[1];     

    $dh = opendir($dir);
    while (($file = readdir($dh)) !== false) {
        if ($file != '.' && $file != '..' && !$this->__endswith($file, 'gz')) {
            $this->requestAction('/parse/' . $affiliate . '/file:' . $file);
            $this->Session->setFlash($affiliate . '/' . $file . ' parsed');
        }
    }
    closedir($dh);
    $this->autoRender = false;
}
Laurent
You should edit your question instead of adding an answer next time
Phill Pafford
oh, I seedidn't know thati'll keep it in mind
Laurent
A: 

I think the problem lies in this section of code:

  # truncate table products before adding new records to avoid duplicate records
        $productModel->query('TRUNCATE TABLE products');

This is a poor way to avoid duplicate records. That should be managed with restraints on the DB. That being said, somehow this bit of code is being run again in the middle of the process.

Is this set up as a CRON or being run automatically somehow? If so, what is happening is the previous file has not finished parsing when the next one starts.

cdburgess
I don't really do the TRUNCATE to just avoid duplicate records ;) In the products table I want only to be products currently present in the XML feeds. So I need to empty it completely so they won't be any old records in it ;)This piece of code should be executed for sure.And indeed, it seems to be run several times. Question is why :)Now I run it automatically, but the goal is to do it in a CRON job when it goes into production.I noticed all goes well when I execute the same script on my localhost. Maybe something with memory issue?
Laurent
I just noticed this is `function index()`, are you running this in the browser? If so, you should consider running it command line. I will bet that command line will not have the same problems. It could be that after a certain amount of time, the browser tries to reload the page.
cdburgess
i ran this in the browser yes :) I'll give it a shot, although I guess the problem will persist as it did fine when I ran it on localhost.thx
Laurent
I've transferred the cakePHP project from a shared hosting server to another, more powerful dedicated server and now it runs smoothly :) All the records get inserted like they should and the script doesn't restart itself.Even when I ran it in browser. So it would have been just a memory issue.
Laurent