views:

116

answers:

4

I'd just like to check, is this a good way of iterating over rows in a database and deleting them when processed? I first had

  • Grab all
  • Process all
  • Delete all

But figured that might leave to problems if new rows were added (from outside) while the "Process all" step is running.

// the limited for loop is to prevent an infinite loop
// assume the limit to be much higher than the expected
// number of rows to process
for ($i = 0; $i < $limit; $i++) 
{

    // get next mail in queue
    $st = $db->prepare('
        SELECT id, to, subject, message, attachment
        FROM shop_mailqueue
        LIMIT 0,1
    ');
    $st->execute();
    $mail = $st->fetch(PDO::FETCH_ASSOC);

    // if no more mails in queue, stop
    if ($mail === false) {
        break;
    }

    // send queued mail
    $mailer = new PHPMailer();
    $mailer->AddAddress($mail['to']);
    $mailer->SetFrom('[email protected]', 'xxx.nl');
    $mailer->Subject = $mail['subject'];
    $mailer->Body = $mail['message'];
    $mailer->AddAttachment($mail['attachment']);
    $mailer->Send();

    // remove mail from queue
    $st = $db->prepare('
        DELETE FROM shop_mailqueue WHERE id = :id
    ');
    $st->bindValue('id', $mail['id']);
    $st->execute();

}
A: 

You can lock table during process

http://dev.mysql.com/doc/refman/5.0/en/lock-tables.html

Lliane
Yes, but that will keep the site (which adds rows to the table) waiting while this process is running. Since this script is sending (sometimes large) email attachments, it can take some time.
Bart van Heukelom
+2  A: 

I think your grab all, process all, delete all approach is fine with a simple change:

  • Instead of doing a simple delete from table, do a delete from table where key in (<all keys retrieved>).

You can easily build the <all keys retrieved> because you first grab everything.

This will perform (a lot) better than doing many queries as you plan to do now.

Vinko Vrsalovic
I had half written the same solution when this one turned up. Ideally you want to put the results form the select statement into a suitable data stucture which you can then use for the process all and delete all stages.
pjp
I would be concerned with this that if you terminate it before it finishes or it fails halfway through, you could end up re-processing queue items the next time you start it.
Tom Haigh
True. You can alleviate that by having the select with an update table set processed = 1 where key in (<all keys retrieved>) in the same transaction. Having 4 queris will still be faster for a relatively big N.
Vinko Vrsalovic
I'm not really concerned about the performance. If it's going to email attachments of several MB's, the query speed is pretty irrelevant.
Bart van Heukelom
+1  A: 

First and foremost, any time you say, "Iterating Over Rows From a Database," you better be talking about some seriously complex operation happening for each row. Otherwise, you can and should think in sets!

It looks like you're taking everything in the mail queue and dumping the table after sending all of the emails. Now, I should note that your limit 0,1 here is a bit disconcerting, since you have no order by. No database stores things in a reliable order (and for good reason--it has to constantly move things to optimize them). So, a better code would be:

// get next mail in queue
$st = $db->prepare('
    SELECT id, to, subject, message, attachment
    FROM shop_mailqueue
    ORDER BY id ASC
    LIMIT 0, ' . $limit);
        $st->execute();

$lastid = 0;

while ($mail = $st->fetch(PDO::FETCH_ASSOC))
{    
    // send queued mail
    $mailer = new PHPMailer();
    $mailer->AddAddress($mail['to']);
    $mailer->SetFrom('[email protected]', 'xxx.nl');
    $mailer->Subject = $mail['subject'];
    $mailer->Body = $mail['message'];
    $mailer->AddAttachment($mail['attachment']);
    $mailer->Send();

    $lastid = $mail['id'];
}


// remove mail from queue
$st = $db->prepare('DELETE FROM shop_mailqueue WHERE id <= ' . $lastid);
$st->execute();

This way, you're only making two (count 'em!) queries to the database, which is much more optimal than pulling one row back each time.

Now, if the whole $limit deal is unnecessary, and you really want to pull back all of the rows in the queue and then just dump the queue out, change your delete statement to:

TRUNCATE TABLE shop_mailqueue

See, truncate isn't bound by transaction logic, and just wipes the table clean, no questions asked. Therefore, it's immensely fast (as in milliseconds, regardless of table size). delete just doesn't have that kind of speed. Just remember that truncate is a nuclear option--once you truncate, it's gone gone.

Eric
Just wanted to mention that you are counting on the insertion process not filling the blanks on the id column in, should they happen.
Vinko Vrsalovic
Thank you for your concerns, but most are unjustified (partly because I didn't give enough information). First, the limit is there to prevent an infinite loop (in case of a bug). Assume that it's higher than the number of rows. I could fetch a number of rows at once, but then if the process crashes when halfway, half of them have to be processed again. I cannot truncate the table because rows might be added while the process is running. I also do not really care for the order in which the queue is processed.
Bart van Heukelom
A: 

What you are doing is fine, I think. You fetch all that you had, you process them and delete only those that you processed. I also would not lock the table in your process, because you will block those who create the records.

If the only problem you have is to process those e-mails that may have been created while you were sending out existing ones, then you could just create a while loop on top of all your procedure to run always when there are new records.

In addition, why getting LIMIT 0,1 ($limit) times, if you can just modify the query to get max $limit items in one query LIMIT 0,$limit.

van