views:

46

answers:

2

When there are multiple PHP scripts running in parallel, each making an UPDATE query to the same record in the same table repeatedly, is it possible for there to be a 'lag time' before the table is updated with each query?

I have basically 5-6 instances of a PHP script running in parallel, having been launched via cron. Each script gets all the records in the items table, and then loops through them and processes them.

However, to avoid processing the same item more than once, I store the id of the last item being processed in a separate table. So this is how my code works:

function getCurrentItem()
{
  $sql = "SELECT currentItemId from settings";
  $result = $this->db->query($sql);
  return $result->get('currentItemId');
}

function setCurrentItem($id)
{
   $sql = "UPDATE settings SET currentItemId='$id'";
   $this->db->query($sql);
}

$currentItem = $this->getCurrentItem();

$sql = "SELECT * FROM items WHERE status='pending' AND id > $currentItem'";
$result = $this->db->query($sql);
$items = $result->getAll();

foreach ($items as $i)
{
   //Check if $i has been processed by a different instance of the script, and if so, 
   //leave it untouched.
   if ($this->getCurrentItem() > $i->id) 
     continue;

   $this->setCurrentItem($i->id);
   // Process the item here
}

But despite of all the precautions, most items are being processed more than once. Which makes me think that there is some lag time between the update queries being run by the PHP script, and when the database actually updates the record.

Is it true? And if so, what other mechanism should I use to ensure that the PHP scripts always get only the latest currentItemId even when there are multiple scripts running in parallel? Would using a text file instead of the db help?

+1  A: 

If this is run in parallell there's little measure to avoid race conditions.

script1:

getCurrentItem() yields Id 1234
...context switch to script2, before script 1 gets to run its update statement.

script2: 
getCurrentItem() yields Id 1234

And both scripts process Id 1234

You'd want to update and check status of the item an all-or-nothing operation, you don't need the settings table, but you'd do something like this (pseudo code):

SELECT * FROM items WHERE status='pending' AND id > $currentItem

foreach($items as $i) {
 rows =  update items set status='processing' where id = $i->id and status='pending';
  if(rows == 0) //someone beat us to it and is already processing the item
    continue;
   process item..
 update items set status='done' where id = $i->id;
}
nos
In your code, where is `$rows` being set from? Its next to an update query. Can you eleborate?
Click Upvote
Well, I don't know much PHP. The idea is that issuing an UPDATE will return the no. of rows that were affected. if it's 0, it means the `where status='pending'` didn't match cause someone else changed the status in the mean time.
nos
Apart from checking the rows affected from the query, is there any other, more fool-proof solution? Like using text files..?
Click Upvote
Text files would be rather fragile. Checking the no. of rows this way is fool proof when it's done with 1 update statement, though if your scripts fails/crashes somewhere after that, you're left with rows with a status of 'processing'. For more fool proof, you'd probably need to use transactions and InnoDB tables.
nos
+1  A: 

What you need is for any thread to be able to:

  • find a pending item
  • record that that item is now being worked on (in the settings table)

And it needs to do both of those in one go, without any other thread interfering half-way through.

I recommend putting the whole SQL in a stored procedure; that will be able to run the entire thing as a single transaction, which makes it safe from competing threads.

vincebowdren