views:

332

answers:

2

Hi!

On the project which I am currently working, I have to read an Excel file (with over a 1000 rows), extract all them and insert/update to a database table.

1ª Question: in terms of performance, is better to add all the records to a Doctrine_Collection and insert/update them after using the fromArray() method, right? One other possible approach is to create a new object for each row (a Excel row will be a object) and them save it but I think its worst in terms of performance.

2ª Question: Every time the Excel is uploaded, it is necessary to compare its rows to the existing objects on the database. If the row does not exist as object, should be inserted, otherwise updated. My first approach was turn both object and rows into arrays (or Doctrine_Collections); then compare both arrays before implementing the needed operations.

Can anyone suggest me any other possible approach?

Thanks all for the help, Best regards!

+1  A: 

I have never worked on Doctrine_Collections, but I can answer in terms of database queries and code logic in a broader sense. I would apply the following logic:-

  1. Fetch all the rows of the excel sheet from database in a single query and store them in an array $uploadedSheet.

  2. Create a single array of all the rows of the uploaded excel sheet, call it $storedSheet. I guess the structures of the Doctrine_Collections $uploadedSheet and $storedSheet will be similar (both two-dimensional - rows, cells can be identified and compared).

3.Run foreach loops on the $uploadedSheet as follows and only identify the rows which need to be inserted and which to be updated (do actual queries later)-

$rowsToBeUpdated =array();
$rowsToBeInserted=array();
foreach($uploadedSheet as $row=>$eachRow)  
{  
    if(is_array($storedSheet[$row]))
    {
         foreach($eachRow as $column=>$value)
         {
              if($value != $storedSheet[$row][$column]) 
              {//This is a representation of comparison
                  $rowsToBeUpdated[$row]=true;
                  break; //No need to check this row anymore - one difference detected.
              }
         }
    }
    else
    {
         $rowsToBeInserted[$row] = true;
    }
} 

4. This way you have two arrays. Now perform 2 database queries -

  • bulk insert all those rows of $uploadedSheet whose numbers are stored in $rowsToBeInserted array.

  • bulk update all the rows of $uploadedSheet whose numbers are stored in $rowsToBeUpdated array.

These bulk queries are the key to faster performance.

Let me know if this helped, or you wanted to know something else.

sandeepan
I will carefully analyze your solution proposal. Thanks in advance for the hep.
Rui Gonçalves
you are welcome
sandeepan
A: 

Hi.

We did a bit of this in a project recently, with CSV data. it was fairly painless. There's a symfony plugin tmCsvPlugin, but we extended this quite a bit since so the version in the plugin repo is pretty out of date. Must add that to the @TODO list :)

Question 1:

I don't explicitly know about performance, but I would guess that adding the records to a Doctrine_Collection and then calling Doctrine_Collection::save() would be the neatest approach. I'm sure it would be handy if an exception was thrown somewhere and you had to roll back on your last save..

Question 2:

If you could use a row field as a unique indentifier, (let's assume a username), then you could search for an existing record. If you find a record, and assuming that your imported row is an array, use Doctrine_Record::synchronizeWithArray() to update this record; then add it to a Doctrine_Collection. When complete, just call Doctrine_Collection::save()

A fairly rough 'n' ready implementation:

// set up a new collection
$collection = new Doctrine_Collection('User');

// assuming $row is an associative 
// array representing one imported row.

foreach ($importedRows as $row) {

    // try to find an existing record 
    // based on a unique identifier.
    $user = Doctrine_Core::getTable('User')
        ->findOneByUsername($row['username']);

    // create a new user record if 
    // no existing record is found.
    if (!$user instanceof User) {
        $user = new User();
    }

    // sync record with current data.
    $user->synchronizeWithArray($row);

    // add to collection.
    $collection->add($user);

}

// done. save collection.
$collection->save();

Pretty rough but something like this worked well for me. This is assuming that you can use your imported row data in some way to serve as a unique identifier.

NOTE: be wary of synchronizeWithArray() if you're using sf1.2/doctrine 1.0 - if I remember correctly it was not implemented correctly. it works fine in doctrine 1.2 though.

Darragh
For now, just a small correction. The invoked method on the User table object must be findOneByUsername(), otherwise the return object will be a Doctrine_Collection instead of a user.
Rui Gonçalves
sorry I was doing this off the top of my head. you're absolutely correct. fixed.
Darragh