views:

210

answers:

10

I'm working with an application that adds a new row to the database, based on the last row meeting a certain criteria. Is there a standard pattern for dealing with this type of problem, or do I simply need to lock the table?

Here is an over-simplified visualization:

A1
A2
A3
B1
B2

Using the visualization above, a web page loads up the highest "B" value, which is "2". Then, after some time, it wants to insert B3, the next record in the series. However, it has to check to make sure that someone else didn't do the same thing.

Like I mentioned, I know that I can read the expected value within a transaction, or I could lock the table, or possibly even the last row. What I'm asking is if there is what the recommended strategy is.

+1  A: 

See this entry in my blog on how to do this using recursive CTE's and a single IDENTITY:

Update:

If the problem is building the equipment to the next step, then you probably better use absolute value instead of relative.

Remember the previous value of the step in the variable (in the page itself or on server side), and just update it with the new value of the variable.

Instead of this:

UPDATE  mytable
SET     step = step + 1

use this:

SET @nextstep = 2
UPDATE  mytable
SET     step = @nextstep

You may also add an autoincremented last_update field to the column to make sure you're updating a column not been updated since your page has loaded:

SELECT  last_update
INTO    @lastupdate
FROM    mytable
WHERE   item_id = @id

UPDATE  mytable
SET     step = @nextstep
WHERE   item_id = @id
        AND last_update = @lastupdate

Update 2:

If you are using a linked list of states (i. e. you don't update, but insert new states), then just mark the column IDENTITY and insert the ID of the previous state:

item_id  step_id  prev_step_id
1        10232     0
1        12123     10232

, make step_id and prev_step_id unique, and query like this:

WITH    q (item_id, step_id, step_no) AS
        (
        SELECT  item_id, step_id, 1
        FROM    mytable
        WHERE   item_id = 1
                AND prev_step_id = 0
        UNION ALL
        SELECT  q.item_id, m.step_id, q.step_no + 1
        FROM    q
        JOIN    mytable m
        ON      m.item_id = q.item_id
                m.prev_step_id = q.step_id
        )
SELECT  *
FROM    q

If two people want to insert two records, then the UNIQUE constraint on prev_step_id wil fire and the last insert will fail.

Quassnoi
That's why I mentioned my example was an oversimplification. In actuality, each row corresponds to a place that a piece of equipment visited in a factory. The page (not easy to rewrite), currently moves the equipment to the next step. The problem is, if 2 people push the same button after loading the page, the equipment can move 2 steps. I need to check to make sure the previous location matches the expected location.
Jason Young
Great idea with the Unique constraint!
Jason Young
@SuperJason: will be the subject of today's post in my blog. Thanks for a nice question!
Quassnoi
A: 

This is a perfect case to use a queue. Try out Message Broker.

AlexKuznetsov
That does look interesting, but I'm wondering if that's a bit much for this problem. That looks like more of an overall architecture solution. Unfortunately, this is more of a band-aid situation. I am going to read more about that though.
Jason Young
It still would have the same problem if two actions were sent to the queue - preventing that involves the same quandary.
le dorfier
@le dorfier: SSB has a mechanism called 'conversation groups locking' that, when properly used, mitigates this very problem. Knowing that the queue is a 'queue' and not just any table allows some hockey-pockey not available to normal operations.
Remus Rusanu
A: 
UPDATE yourtable
  SET  location = 'B3'
 WHERE primary-key = 1231421
   AND location    = 'B2'

If someone's already moved it out of B2, then nothing will happen. This seems better than simply blindly incrementing the location; the user wanted it to go from B2 to B3, not push it one forward.

Alright, given the new row requirement:

INSERT INTO yourtable ( item, location ) VALUES( 123, 'B3' )
 WHERE NOT EXISTS( SELECT * FROM yourtable WHERE item = 123 AND location = B3 )

let the database do the work for you.

Matt Rogish
The problem is, B3 is a new row, no an update to an existing row.
Jason Young
A: 

the usual way to do this is to have a rowversion type column and check the row value with the value incoming from the client. if the row in the table has been updated the rownumbers won't match. index the rowversion column and it'll fly.

Mladen Prajdic
+1  A: 

This looks to me like a classic case of needing operator I always want: "ensure that a tuple satisfying these conditions exists, and give me the key."

In my case, it's usually a simple, "I have this credit card number and expiry date, what's the key for it?" I don't actually care if it's already in the DB or not (in fact, the application ought not to be able to tell, for security purposes) I just want the identifier for it if it is there, or I want it to be created if it's not, and get the new identifier for that creation.

As far as I can tell, with current DBMS technology, you need to lock the table, because you must make the decision whether to insert or not based on what's already there. I'd love to have a better solution to this, however.

Curt Sampson
A: 

I know your given example is simplified, but to meet those requirements you could do the following.

Since the INSERT/SELECT is one statement, it's implicitly in a transaction. If you need to do something that you can't express relationally you'll need to wrap it in an explicit transaction.

Primary key ensures no concurrency issues beyond your given default transaction isolation.

CREATE TABLE Sequence
(
    [Name] char(1),
    [Seq] int
    PRIMARY KEY (Name, Seq)
)
GO
CREATE PROCEDURE Sequence_Insert
(
    @name char(1)
)
AS
INSERT INTO Sequence(Name, Seq)
SELECT 
    @Name,
    COALESCE(MAX(Seq),0) + 1
FROM
    Sequence
WHERE
    Name = @Name
GO
exec Sequence_Insert 'A'
exec Sequence_Insert 'A'
exec Sequence_Insert 'B'
exec Sequence_Insert 'A'
exec Sequence_Insert 'C'
GO
SELECT * FROM Sequence
Joseph Kingry
A: 

Just wrap it into the same statement. The first insert of new value (2) will succeed, the second will add zero rows.

create table t1 (i int)
insert t1 values (1)

insert t1 (i)
    select 2 where exists (select max(i) from t1 having max(i) = 1)

insert t1 (i)
    select 2 where exists (select max(i) from t1 having max(i) = 1)
Why not just "INSERT t1(i) SELECT MAX(i) + 1 FROM t1" Of course you need to work into this the whole second part of the key (A,B,etc from the question)
Joseph Kingry
Wouldn't the following be simpler and offer better performance? INSERT t1 (i) SELECT 2 WHERE NOT EXISTS (SELECT 1 FROM t1 WHERE i = 2)
Steve Broberg
@Joseph/@Steve - I'm assuming the sequence is not a simple integer value as in my example (the question mentions this is a simplification), this is just to demonstrate the concept. The core of what I'm suggesting is to include your if-state-is-as-expected into the insert, however you determine the sequence.
+1  A: 

Conceptually, if I understand the accumulation of clarifications, the situation is that you want to record that an item has entered a new state - a piece of equipment has reached a certain step. And you want to do this based on incrementing the step it is currently believed to be in.

I would restate this to perhaps be more manageable and unambiguous. Can you simply insert a record asserting that a machine is observed in a state, with a timestamp?

Deriving the current step from previous information (that may itself be imperfectly known) seems risky, especially if it's a simple iteration calculation that can occur from 0 to n times based on circumstances.

OTOH, if it's a timestamped observation of actual state, then it's self correcting (it doesn't matter what state you thought it was in before), and multiple assertions don't cause problems.

Can you reconstruct the logic this way based on the existing forms (or maybe a small modification of the form or the network confiration or whatever)? Is there user, or ip address, etc. associated with a given step of subset of steps? Are there associated transactions that are only valid if it's at a step or subset of steps?

le dorfier
A: 

The proper strategy depends on what exactly are the actions that occur between reading B2 and inserting B3. These following points are rather theoretical and not practical T-SQL samples, but I gather that is the sense of your question.

  • If this is a cheap action with disposable outcome then you can let every transaction do it, then the first one that inserts B3 succeeds and the rest fail with a duplicate key violation (unique constraint), recover from exception and resume as if nothing happened.
  • A relatively expensive operation with disposable outcome but unlikely to happen concurrently. Same as above, you will take the penalty of disposing the outcome of an 'expensive' operation, but this will happen seldom.
  • A non-disposable outcome that can be rolled back (can be embedded in your transaction or you can enroll into DTC) and is unlikely to happen concurrently. Same as above but enroll the 'operation' into your transaction (locally or DTC), on conflict you rollback and try again with the new value of the Bs sequence.
  • The outcome is non-disposable (the result of the 'operations' cannot be ignored, it must be recorded, eg. you recorded a financial transaction). In this case you must prevent concurrency and locking is the way to go. Table locks are always overkill, you should probably go with an UPDLOCK on the lookup after 'B2'. Unfortunately transaction isolation levels are of no help here (on all levels two threads can both read 'B2' and plunge ahead, resulting, hopefully, in a deadlock at the insert time). If the 'operation' between reading B2 and inserting B3 is something as complex as returning the HTML to the user and waiting for the next POST then you probably won't be able to afford leaving such long lived U locks on real data and the best way is to device an application lock schema, using sp_getapplock.
Remus Rusanu
A: 

The recommanded SQL strategy is certainly to use SELECT FOR UPDATE. I'm suprised no one mentioned it.

SELECT id FROM tasks WHERE id = max(id) FOR UPDATE OF tasks;

SELECT FOR UPDATE locks just exactly what you need to lock, so it is much simpler than manual locking.

bortzmeyer