views:

1057

answers:

4

There is small system, where a database table as queue on MSSQL 2005. Several applications are writing to this table, and one application is reading and processing in a FIFO manner.

I have to make it a little bit more advanced to be able to create a distributed system, where several processing application can run. The result should be that 2-10 processing application should be able to run and they should not interfere each other during work.

My idea is to extend the queue table with a row showing that a process is already working on it. The processing application will first update the table with it's idetifyer, and then asks for the updated records.

So something like this:

start transaction
update top(10) queue set processing = 'myid' where processing is null
select * from processing where processing = 'myid'
end transaction

After processing, it sets the processing column of the table to something else, like 'done', or whatever.

I have three questions about this approach.

First: can this work in this form?

Second: if it is working, is it effective? Do you have any other ideas to create such a distribution?

Third: In MSSQL the locking is row based, but after an amount of rows are locked, the lock is extended to the whole table. So the second application cannot access it, until the first application does not release the transaction. How big can be the selection (top x) in order to not lock the whole table, only create row locks?

+1  A: 

This approach looks reasonable to me, and is similar to one I have used in the past - successfully.

Also, the row/ table will only be locked while the update and select operations take place, so I doubt the row vs table question is really a major consideration.

Unless the processing overhead of your app is so low as to be negligible, I'd keep the "top" value low - perhaps just 1. Of course that entirely depends on the details of your app.

Having said all that, I'm not a DBA, and so will also be interested in any more expert answers

Phil Nash
+5  A: 

This will work, but you'll probably find you'll run into blocking or deadlocks where multiple processes try and read/update the same data. I wrote a procedure to do exactly this for one of our systems which uses some interesting locking semantics to ensure this type of thing runs with no blocking or deadlocks, described here.

Greg Beech
This is very good, exactly what I need. Thanks.
Biri
Nice. I said I'd be interested ;-)
Phil Nash
one thing to watch out for is non-processed code where a system dequeues some information but crashes/stops before it has a chance to process the information. a timeout before requeuing the work would solve this but can get quite messy
marshall
abigblackman: Yes, I'm aware of this. I'm planning some kind of communication between the services like "I'm alive" and if one breaks, others can take over the work it started.
Biri
Updated blog link: http://gregbeech.com/blog/retrieving-a-row-exactly-once-with-multiple-polling-processes-in-sql-server
James Dunne
+1  A: 

In regards to your question about locking. You can use a locking hint to force it to lock only rows

update mytable with (rowlock) set x=y where a=b
marshall
Yes, I know this, but regardless, after a while it switches to table locking. I guess this depends on the consideration of speed: does it worth to lock the table instead of keeping and handling a list of row locks.
Biri
AFAIK rowlock is a hint. It does not force SQL to restrict locking to row level, but only indicates that a row lock would be preferred to a page lock or table lock. SQL will still escelate to a page lock table lock at it's own discretion.
My Other Me
+1  A: 

Biggest problem with this approach is that you increase the number of 'updates' to the table. Try this with just one process consuming (update + delete) and others inserting data in the table and you will find that at around a million records, it starts to crumble.

I would rather have one consumer for the DB and use message queues to deliver processing data to other consumers.