views:

67

answers:

2

We have an interesting problem:

We have a JMS queue of job statuses, and two identical processes pulling from the queue to persist the statuses via JDBC. When a job status is pulled from the queue, the database is checked to see if there is already a row for the job. If so, the existing row is updated with new status. If not, a row is created for this initial status.

What we are seeing is that a small percentage of new jobs are being added to the database twice. We are pretty sure this is because the job's initial status is quickly followed by a status update - one process gets one, another process the other. Both processes check to see if the job is new, and since it has not been recorded yet, both create a record for it.

So, my question is, how would you go about preventing this in a vendor-neutral way? Can it be done without locking the entire table?

EDIT: For those saying the "architecture" is unsound - I agree, but am not at liberty to change it.

+4  A: 

Create a unique constraint on JOB_ID, and retry to persist the status in the event of a constraint violation exception.

That being said, I think your architecture is unsound: If two processes are pulling messages from the queue, it is not guaranteed they will write them to the database in queue order: one consumer might be a bit slower, a packet might be dropped, ..., causing the other consumer to persist the later messages first, causing them to be overridden with the earlier state.

One way to guard against that is to include sequence numbers in the messages, update the row only if the sequence number is as expected, and delay the update otherwise (this is vulnerable to lost messages, though ...).

Of course, the easiest way would be to have only one consumer ...

meriton
Yes, we have examples where the second message is persisted after the first as you describe.
SingleShot
A: 

JDBC connections are not thread safe, so there's nothing to be done about that.

"...two identical processes pulling from the queue to persist the statuses via JDBC..."

I don't understand this at all. Why two identical processes? Wouldn't it be better to have a pool of message queue listeners, each of which would handle messages landing on the queue? Each listener would have its own thread; each one would be its own transaction. A Java EE app server allows you to configure the size of the message listener pool to match the load.

I think a design that duplicates a process like this is asking for trouble.

You could also change the isolation level on the JDBC connection. If you make it SERIALIZABLE you'll ensure ACID at the price of slower performance.

Since it's an asynchronous process, performance will only be an issue if you find that the listeners can't keep up with the messages landing on the queue. If that's the case, you can try increasing the size of the listener pool until you have adequate capacity to process the incoming messages.

duffymo
I believe this is analagous to what we have, but substitutes threads for processes, yielding the same issues. Not true? (Not that we can change our design.)
SingleShot
But you don't have to maintain two processes, and you can control the size of the thread pool more easily, because it's under the control of the app server. How will you increase capacity if two processes aren't enough? Add more processes? The designs are not equivalent, even if the issues are the same for both. I think increased isolation is one part of the solution.
duffymo