views:

21

answers:

2

Hi there, I have a software system that performs OCR on Multiple machine simultaneously. Current system works as follows:

  1. All documents which needs to be ocred are inserted into a table in db.
  2. Each client ocr machine pools that table and whenever data is found for ocr, it locks table and pick n no. of files for ocr. Locking is used for atomicity.
  3. After each document is ocred, status of the document is updated as complete.

I know this is a serious mistakes to set a database as a synchronization place. It is running fine but sometimes I can see dead lock on database..

So my question is, What is the better way to design such system, I want database as a storage device only not a synchronizing place. I want to hear your thoughts.

+3  A: 

Well, you could have a column in the table which says whether the record is currently being processed. Within a transaction, fetch the data for a record which isn't currently being processed, and update the record to say that it's now being processed. The details of how contention will be handled there will depend on the kind of transactions you create and the database you use, but I suspect that transactions should be at the heart of it.

That's assuming you really want to use a database rather than a message queue of some description. You might consider using a message queue in conjunction with the database... and some databases have queues built into them, which could be useful too. Even if you wanted the record in the database as well, you could have a queue just of the IDs - clients could just pull the next item from the queue, then fetch the data. You may still want to record the time at which the item was pulled from the queue, so that if the client crashes or something like that, a batch job can put any failed jobs (e.g. ones which were picked up a day ago but don't have results yet) back in the queue.

Jon Skeet
Thanks for your prompt reply, I have actually done similar thing that you mentioned in your first paragraph.But I am not satisfied by this solution.
Int3
@Int3: Why not? I'm not suggesting keeping the transaction open while processing the data - just while marking it as in progress.
Jon Skeet
+2  A: 

With using database polling for ocr files, it is better to use windows messaging service. What if the database is down and your ocr service is running, the ocr service will not get start until and unless the database service is up, with using windows messaging queue you can get the information for ocr file from messaging service (online or off-line) so that ocr service will automatic start after the machine is up and there will not any issue of deadlocking on database.

Syntax
MSMQ sounds good which is also suggested by Jon skeet.
Int3