views:

457

answers:

4

Right, this code goes through a rather large multidimensional array (has about 28,000 rows and 16 parts).

Order of events:

  1. Check if the data exists in the database
  2. if it exists - Update it with the new data
  3. if it doesn't exist - Insert it

Simple.

But right now to go through this it has taken over 30min i think and Still going.

$counter = 0;
$events = sizeof($feed_array['1'])-1;
while($counter <= $events ) {

 $num_rows = mysql_num_rows(mysql_query("SELECT * FROM it_raw WHERE perfID = '".addslashes($feed_array['16'][$counter])."'")); 
 if($num_rows) {
  $eventDate=explode("/", $feed_array['1'][$counter]); //print_r($eventDate);
  $eventTime=explode(":", $feed_array['2'][$counter]); //print_r($eventTime);  
  $eventUnixTime=mktime($eventTime[0], $eventTime[1], "00", $eventDate[1], $eventDate[0], $eventDate[2]);

  mysql_query("UPDATE `it_raw` SET  
     `eventtime` =    '".$eventUnixTime."',
     `eventname` =    '".addslashes($feed_array['3'][$counter])."',
     `venuename` =    '".addslashes($feed_array['4'][$counter])."',
     `venueregion` =   '".addslashes($feed_array['5'][$counter])."',
     `venuepostcode` =   '".addslashes($feed_array['6'][$counter])."',
     `country` =    '".addslashes($feed_array['7'][$counter])."',
     `minprice` =    '".addslashes($feed_array['8'][$counter])."',
     `available` =    '".addslashes($feed_array['9'][$counter])."',
     `link` =     '".addslashes($feed_array['10'][$counter])."',
     `eventtype` =    '".addslashes($feed_array['11'][$counter])."',
     `seaOnSaleDate` =   '".addslashes($feed_array['12'][$counter])."',
     `perOnSaleDate` =   '".addslashes($feed_array['13'][$counter])."',
     `soldOut` =    '".addslashes($feed_array['14'][$counter])."',
     `eventImageURL` =   '".addslashes($feed_array['15'][$counter])."',
     `perfID`=     '".addslashes($feed_array['16'][$counter])."'
     WHERE  `perfID` = ".$feed_array['16'][$counter]." LIMIT 1 ;");
  echo "UPDATE ".$feed_array['16'][$counter].": ".addslashes($feed_array['3'][$counter])."\n";
 } else {
  $eventDate=explode("/", $feed_array['1'][$counter]); //print_r($eventDate);
  $eventTime=explode(":", $feed_array['2'][$counter]); //print_r($eventTime);  
  $eventUnixTime=mktime($eventTime[0], $eventTime[1], "00", $eventDate[1], $eventDate[0], $eventDate[2]);
  $sql = "INSERT INTO  `dante_tickets`.`it_raw` (
    `id` ,
    `eventtime` ,
    `eventname` ,
    `venuename` ,
    `venueregion` ,
    `venuepostcode` ,
    `country` ,
    `minprice` ,
    `available` ,
    `link` ,
    `eventtype` ,
    `seaOnSaleDate` ,
    `perOnSaleDate` ,
    `soldOut` ,
    `eventImageURL` ,
    `perfID`
    )
    VALUES (
     NULL ,  
     '".$eventUnixTime."',  
     '".addslashes($feed_array['3'][$counter])."',  
     '".addslashes($feed_array['4'][$counter])."',  
     '".addslashes($feed_array['5'][$counter])."',  
     '".addslashes($feed_array['6'][$counter])."',  
     '".addslashes($feed_array['7'][$counter])."',  
     '".addslashes($feed_array['8'][$counter])."',  
     '".addslashes($feed_array['9'][$counter])."',  
     '".addslashes($feed_array['10'][$counter])."',  
     '".addslashes($feed_array['11'][$counter])."',  
     '".addslashes($feed_array['12'][$counter])."',  
     '".addslashes($feed_array['13'][$counter])."',  
     '".addslashes($feed_array['14'][$counter])."',  
     '".addslashes($feed_array['15'][$counter])."',  
     '".addslashes($feed_array['16'][$counter])."'
    );";

  mysql_query($sql) or die(mysql_error().":".$sql);
  echo "Inserted ".$feed_array['16'][$counter].": ".addslashes($feed_array['3'][$counter])."\n";
 }
 unset($sql);
 $counter++;  
}

UPDATE

I just carried out profiling one one of the rows:

mysql> EXPLAIN SELECT * FROM it_raw WHERE perfID = 210968;
+----+-------------+--------+------+---------------+--------+---------+-------+------+-------+
| id | select_type | table  | type | possible_keys | key    | key_len | ref   | rows | Extra |
+----+-------------+--------+------+---------------+--------+---------+-------+------+-------+
|  1 | SIMPLE      | it_raw | ref  | perfID        | perfID | 4       | const |    1 |       | 
+----+-------------+--------+------+---------------+--------+---------+-------+------+-------+
1 row in set (0.07 sec)

UPDATE 2

To try and "speed" things up, instead of carrying out the UPDATE and INSERT statements straight away, i've now placed them in a variable (so only the initial select runs - to check a duplicate - then stores the action [insert or update]). At the end of the loop it executes all the statments.

Except now, it's coming up with MySQL error that the syntax is incorrect. (when initially there was nothing wrong).

I've simply replaced the mysql_query with:

$sql_exec .= "SELECT.... ;";

is there something i'm missing here for the formatting?

UPDATE 3 OK finally fixed it Lessons Learned:

  1. Do logic search first on database
  2. Carry out insert/updates in bulk.

Here is the final code which now takes about 60 seconds to run (from over 30min+)

while($counter <= $events ) {

     $num_rows = mysql_num_rows(mysql_query("SELECT * FROM it_raw WHERE perfID = '".addslashes($feed_array['16'][$counter])."'")); 
     if($num_rows) {
      $eventDate=explode("/", $feed_array['1'][$counter]); //print_r($eventDate);
      $eventTime=explode(":", $feed_array['2'][$counter]); //print_r($eventTime);  
      $eventUnixTime=mktime($eventTime[0], $eventTime[1], "00", $eventDate[1], $eventDate[0], $eventDate[2]);

      $sql_exec[] = "UPDATE `it_raw` SET `eventtime` = '".$eventUnixTime."',`eventname` = '".addslashes($feed_array['3'][$counter])."',`venuename` = '".addslashes($feed_array['4'][$counter])."',`venueregion` = '".addslashes($feed_array['5'][$counter])."',`venuepostcode` = '".addslashes($feed_array['6'][$counter])."',`country` = '".addslashes($feed_array['7'][$counter])."',`minprice` = '".addslashes($feed_array['8'][$counter])."',`available` = '".addslashes($feed_array['9'][$counter])."',`link` = '".addslashes($feed_array['10'][$counter])."',`eventtype` = '".addslashes($feed_array['11'][$counter])."',`seaOnSaleDate` = '".addslashes($feed_array['12'][$counter])."',`perOnSaleDate` = '".addslashes($feed_array['13'][$counter])."',`soldOut` =  '".addslashes($feed_array['14'][$counter])."',`eventImageURL` =   '".addslashes($feed_array['15'][$counter])."',`perfID`='".addslashes($feed_array['16'][$counter])."' WHERE `perfID` = ".$feed_array['16'][$counter]." LIMIT 1;";
      echo "UPDATE ".$feed_array['16'][$counter].": ".addslashes($feed_array['3'][$counter])."\n";
     } else {
      $eventDate=explode("/", $feed_array['1'][$counter]); //print_r($eventDate);
      $eventTime=explode(":", $feed_array['2'][$counter]); //print_r($eventTime);  
      $eventUnixTime=mktime($eventTime[0], $eventTime[1], "00", $eventDate[1], $eventDate[0], $eventDate[2]);
      $sql_exec[] = "INSERT INTO  `it_raw` (`id` ,`eventtime` ,`eventname` ,`venuename` ,`venueregion` ,`venuepostcode` ,`country` ,`minprice` ,`available` ,`link` ,`eventtype` ,`seaOnSaleDate` ,
        `perOnSaleDate` ,`soldOut` ,`eventImageURL` ,`perfID`) VALUES ( NULL ,'".$eventUnixTime."','".addslashes($feed_array['3'][$counter])."','".addslashes($feed_array['4'][$counter])."','".addslashes($feed_array['5'][$counter])."','".addslashes($feed_array['6'][$counter])."','".addslashes($feed_array['7'][$counter])."','".addslashes($feed_array['8'][$counter])."','".addslashes($feed_array['9'][$counter])."','".addslashes($feed_array['10'][$counter])."','".addslashes($feed_array['11'][$counter])."','".addslashes($feed_array['12'][$counter])."','".addslashes($feed_array['13'][$counter])."','".addslashes($feed_array['14'][$counter])."','".addslashes($feed_array['15'][$counter])."','".addslashes($feed_array['16'][$counter])."');";

      //mysql_query($sql) or die(mysql_error().":".$sql);
      echo "Inserted ".$feed_array['16'][$counter].": ".addslashes($feed_array['3'][$counter])."\n";
     }
     unset($sql);
     $counter++;  
    }

    foreach($sql_exec as $value) {
     mysql_query($value) or die (mysql_error().": ".$value); 
    }
+1  A: 

You can do a number of things.

Try This. I'm unable to test it but the syntax should be right.

$counter = 0;
$events = sizeof($feed_array['1']) - 1;
while($counter <= $events )
{
  $eventDate = explode("/", $feed_array['1'][$counter]); //print_r($eventDate);
  $eventTime = explode(":", $feed_array['2'][$counter]); //print_r($eventTime);             
  $eventUnixTime = mktime($eventTime[0], $eventTime[1], "00", $eventDate[1], $eventDate[0], $eventDate[2]);

  $data = array(
    'eventtime'     => $eventUnixTime,
    'eventname'     => addslashes($feed_array['3'][$counter]),
    'venuename'     => addslashes($feed_array['4'][$counter]),
    'venueregion'   => addslashes($feed_array['5'][$counter]),
    'venuepostcode' => addslashes($feed_array['6'][$counter]),
    'country'       => addslashes($feed_array['7'][$counter]),
    'minprice'      => addslashes($feed_array['8'][$counter]),
    'available'     => addslashes($feed_array['9'][$counter]),
    'link'          => addslashes($feed_array['10'][$counter]),
    'eventtype'     => addslashes($feed_array['11'][$counter]),
    'seaOnSaleDate' => addslashes($feed_array['12'][$counter]),
    'perOnSaleDate' => addslashes($feed_array['13'][$counter]),
    'soldOut'       => addslashes($feed_array['14'][$counter]),
    'eventImageURL' => addslashes($feed_array['15'][$counter]),
    'perfID'        => addslashes($feed_array['16'][$counter]),
  );

  $update = array();
  foreach ($data as $key => $value)
    $update[] = "`$key` = '$value'";

  $sql = "INSERT INTO `dante_tickets`.`it_raw`" .
    '(`id`, `'. implode ('`,`', array_keys($data)) . '`) VALUES ' .
    '(NULL, ' . implode (',', $data) . ') ON DUPLICATE KEY UPDATE ' . 
    implode (',', $update);

  mysql_query($sql) or die(mysql_error().":".$sql);
  echo "Inserted or Updated".$feed_array['16'][$counter].": ".addslashes($feed_array['3'][$counter])."\n";

  unset($sql);
  $counter++;     
}

I forgot to mention this requires that perfID is a unique key.

gradbot
Do you have any examples of using stored procedures. Also the PDO i'm not so sure about :/ i'll wait and see if something else comes up before attempting that.
Shadi Almosri
Why do you believe a stored procedure would increase its performance? Can you cite an example with data which supports this?
MarkR
@MarkR The stored procedure would turn his multiple queries into a single query. This alone would speed it up however using on duplicate key eliminates this.
gradbot
so is ON DUPLICATE KEY defiantly faster?
Shadi Almosri
cool - for now my method is quick enough - but your code is cleaner - so for the next revision i'll try it. Pity stack overflow doesn't have a "assisted answer" feature.
Shadi Almosri
why not use REPLACE INTO or INSERT IGNORE INTO?
Werner
+1  A: 

You could try grouping inserts and updates into groups so the code runs less queries.

For example, you could group all of the inserts into one very large insert, or maybe group every 100 inserts.

Also using prepared statements as gradbot suggested may help.

Other than that, it's not very easy to say which part of it is the major contributor to slowness. You should use a profiler to determine that, for example by using a smaller dataset so the profiled script runs in a reasonable time.

Jani Hartikainen
A: 
  1. Have you profiled this query?

    "SELECT * FROM it_raw WHERE perfID = '".addslashes($feed_array['16'][$counter])."'"

    Because you run it 28000 times.. so unless it is REALLY quick, it will cause you a head ache. Use the EXPLAIN syntax and add an appropriate index if needed.

    EDIT: With profile, I mean that you should try to use EXPLAIN on the mysql-prompt to see what execution plan the MySQL Query Optimizer suggests for this query. I.e, on the prompt, run:

    EXPLAIN SELECT * FROM it_raw WHERE perfID = 426; 
    # Change this id to something existing and valid
    

    What you want to see is that it is using an index, and ONLY an index. If you don't understand the output, copy and paste it here so can I go through it with you.

    UPDATE: As you can see, it takes 0.07 seconds for EVERY row in your data array, plus the time to actually query the database, transfer the result etc. This is roughly 28000 * 0.07 = 1960 seconds, or 32 minutes, just to check if the data exists or not. You need to come up with another way of checking if the data already exists... One, very simple optimization might be:

    EXPLAIN SELECT perfId FROM it_raw WHERE perfID = 210968;
    

    This way, you can use the index on perfId and don't need to visit the table

  2. If it is possible, try to avoid quering the database for each run in the loop. Perhaps it is possible to fetch the ids from the database into a big array of ids that will fit in PHP memory? This will be MUCH quicker than quering the database for every row in your big data array.

PatrikAkerstrand
Can you explain what you mean by profiled? (echo micro time before and after?)
Shadi Almosri
added an explain output (although using phpmyadmin - not sure if it effects what you see [i can use shell if needed]) in my question
Shadi Almosri
Updated with shell out put and time...
Shadi Almosri
No something isn't right with your update, as running that whole script with just the SELECT statements takes ~1min...
Shadi Almosri
40seconds to be precise.
Shadi Almosri
A: 

This is exactly the scenario for which prepared statements were made:

$prepared_statement =
    $DB->prepare('INSERT INTO table(column, column) VALUES(?, ?)');
loop {
    $prepared_statement->execute(array('value1', 'value2');
}

It's implemented in the MySQLi and PDO wrappers. It only compiles the query once and automagically sanitizes the given data, saving time (both developmental and executional) and headache.

orlandu63