views:

108

answers:

2

I'm trying to think of an efficient way to allow a group of people to work through a queue of data entry tasks. Previously we've just had one person doing this so it hasn't been an issue. The back-end is an RDBMS and the front-end is a web-application.

Currently we do something like this:

To assign a record for editing:

SELECT * FROM records WHERE in_edit_queue LIMIT 1;

Then,

To save changes to a previously assigned record:

UPDATE records SET ..., in_edit_queue = false
  WHERE id = ? AND in_edit_queue = true;

This means it's possible for two users to be assigned the same record to edit, and we favor the first one that submits, failing silently on subsequent submissions, e.g.:

  1. User A loads up record 321 for editing
  2. User B loads up record 321 for editing
  3. User B submits changes (they are saved in the DB)
  4. User A submits changes (they are not saved in the DB)

(Note: We can trust all our users to submit acceptable data, so there is no need for us to keep the data from the second UPDATE.)

The problem with this method is when users start at the same time and edit at roughly the same speed, they are often updating the same records but only 1 of them is getting saved. In other words, wasting a lot of man-hours. I can mitigate this to some extent by picking random rows but I'd prefer something a bit more guaranteed.

So here's what I'm thinking...

Have a table called: locked_records (record_id integer, locked_until timestamp)

-- Assign a record for editing:
-- Same as before but also make sure the
-- record is not listed in locked_records...
SELECT * FROM records
  WHERE in_edit_queue AND id NOT IN (
    SELECT record_id FROM locked_records
    WHERE locked_until > now() )
  LIMIT 1;

-- ..and effectively remove it from
-- the queue for the next 5 minutes
INSERT INTO locked_records (record_id, locked_until)
  VALUES (?, now() + 300);

Then:

UPDATE records SET ..., in_edit_queue = false
  WHERE id = ? AND in_edit_queue = true;
DELETE FROM locked_records WHERE record_id = ?;

A typical edit takes about 30 seconds to 1 minute, 5 minutes out of the queue should be a good amount. I can also have an XHR on the web app keep updating the lock if it turned out to be advantageous.

Can anyone offer thoughts on this? Sound like a good way of doing things? Sound like a terrible way? Done this before? I'd love to hear some feedback.

Thanks! J

A: 

What about the RDBMSes internal list of locks? Would altering the SELECT statement to be SELECT FOR UPDATE be an option?

Karl
I don't think so, since the UPDATE will occur through a different database connection as it's a web app. Or not at all, as again.. it is a web app :-) If I'm misunderstanding something please let me know...
p.s. If you meant as an alternative to maintaining a separate locked_records table then yes I think that would work and be more elegant but the 'records' table is constantly being read-from quite heavily by non-editing users and I don't want to lock it for that reason.
The possibility that a different DB connection might be used would have to be evaluated; you're right that a SELECT FOR UPDATE wont work in that scenario.A SELECT FOR UPDATE shouldn't prevent access by readers in most RDBMSes, you'd have to check the docos for your one.
Karl
A: 

Another idea: these records have two additional columns: assigned_to and completed.

When someone wants to edit a record, do something like

update records set assigned_to = ? # assigning to 'me'
where assigned_to is null
and completed = false
limit 1 # only assign one record at a time

Then, to get that row back:

select ...
from records
where assigned_to = ? # assigned to 'me'
and completed = false

And once you're done you set the completed to 'true'.

You could have an additional timestamp column for when a record was assigned to someone, and then add an OR alternative to the "assigned_to is null" part of the where clause in the update statement above, where you require a certain recensy for an assignment to be valid.

Christian Vest Hansen