tags:

views:

147

answers:

2

I have a long mysql queue. I have 1 worker script that processes each queue.

but as this worker is running, the database may be updated or get new row inserts.

an example worker script

get_current_queue = SELECT from queue...

while(get_current_queue) {

update_current_row_from_queue "processing"

//some cpu intensive processing here that takes varying amount of time.

}

the problem is that the worker script takes different amount of time depending on how long the queue at the given time is, and how long each cpu processing takes (converting videos for ex).

so when i run another worker script while the first one is running, the queue not yet marked as "processing" in the queue database by the first worker, will fall onto the second worker's todo list.

I dont know how to approach this problem.

when a worker runs, i need some way to mark this batch so only this worker will run it.

and while this is running, after new rows are inserted, if i choose to start another worker, it can work.

+1  A: 

Devote one field in table queue for worker id currently processing the row.

First do update queue set worker_id = myid Where worker_id = '' LIMIT 100 Then select * from queue where worker_id = myid and process those rows. After all remove these rows from queue or mark them as processed.

You might need some kind of fall-back to cover situation when one of your workers dies while he is processing to unlock his unprocessed rows by setting worker_id back to ''.

Kamil Szot
+1  A: 

Kamil has a good answer. I would extend by suggesting the workers get only one row at a time. That way the queue will be processed more closely to it's original order; and, if the operations in the middle really do take a long time compared to retrieving a row from the database, then you don't need to get more than one at a time.

This also makes it easier to check if a worker has died. Since each worker can only have one task at any given time.

I implemented a system like this with a bunch of machines running imagemagick processing over 250,00 images each weekend. (That's when the jobs came in.) Then I could shut down the workers during the week when there was nothing to do and fire them up as the workload increased. Worked like champ.

Mark Moline
okay so you mean a worker for each row ? what is the maximum threshold of workers running at any given time ? if there are lot of workers each running only one row, wouldn't that be more inefficient ? i'd like to hear some more details about this, its interesting.
ggggggggg