tags:

views:

52

answers:

4

Hi,

I would like to know the general practice used in the industry for generating sequence numbers.

i.e. Get the max from a table. Increment it and store it back.

In order for this to work, which isolation level and/or locking scheme should be used.

I thought serializable should work fine. But it only prevents updates to a table. Selection can still be done. So, the value that would be updated could be same. How can we avoid this?

Thanks!

+1  A: 

The general practice is not to do this at all but to use auto increment fields or automatically generated sequence fields, or whatever facility the database provides.

John Burton
I see. My DB is mysql. Initially I was thinking about using auto increment fields. But read quite a few negative things about them and deciding against using it.
I just blatantly assumed you wanted a second sequence besides an auto increment field. What are your exact worries about autoincrementing?
Wrikken
Thanks mate. I did not want large gaps in the values generated. So, thought of applying my own solution.
+1  A: 

Its a bit unclear, are you sure you want to get the sequence number from the RDBMS? OR can you just implement the concept in your favorite programming language. The key depends on how you plan to share the value.

Rather than using MAX(), just have a simple one row, one column table that has the value. Implement an increment and fetch function and use it everywhere. If your DBMS supports it, this is an ideal use for a stored procedure or trigger.

fishtoprecords
I was thinking of implementing it this way. 1) Fetch the current value from DB (Present in a particular column. No max function will be used) 2) Increment the value fetched in step 1 3) Update the DB with the new valueBut, If I have to use it this way, then there is a possibility that two different connections could end up updating the same value in DB.
+2  A: 

Note that using the REPEATABLE READ isolation level, the default one for InnoDB, you can simply use the SELECT ... FOR UPDATE syntax, as follows:

Test schema:

CREATE TABLE your_table (id int) ENGINE=INNODB;
INSERT INTO your_table VALUES (1), (2), (3);

Then we can do the following:

START TRANSACTION;

SELECT @x := MAX(id) FROM your_table FOR UPDATE;

+---------------+
| @x := MAX(id) |
+---------------+
|             3 |
+---------------+
1 row in set (0.00 sec)

Without committing the transaction, we start another separate session, and do the same:

START TRANSACTION;

SELECT MAX(id) FROM your_table FOR UPDATE;

The database will wait until the lock set in the previous session is released before running this query.

Therefore switching to the previous session, we can insert the new row and commit the transaction:

INSERT INTO your_table VALUES (@x + 1);

COMMIT;

After the first session commits the transaction, the lock will be lifted, and the query in the second session is returned:

+---------+
| MAX(id) |
+---------+
|       4 |
+---------+
1 row in set (8.19 sec)
Daniel Vassallo
Thanks mate. That should solve my problem I guess. Which one would you prefer? Implementing it this way or using the auto increment fields?
@rocksolid: In most situations, I'd go for the auto_increment fields, for the reasons Bill Karwin outlined in his answer.
Daniel Vassallo
Thanks Daniel. Auto increment it shall be! :-)
+4  A: 

Anything you do within transaction scope is subject to race conditions.

So any SQL query you do to get the last used value, increment it, and store it in a new row means that two concurrent clients could fetch the same value and try to use it, resulting in a duplicate key.

There are a few solutions to this:

  1. Locking. Each client sets an exclusive lock on the rows they read if you use SELECT ... FOR UPDATE (as @Daniel Vassallo describes)

  2. Use auto-increment. This mechanism guarantees no race conditions, because allocation of new values happens without regard to transaction scope. As a benefit, no two concurrent clients will get the same value. This means, though, that a rollback doesn't undo allocation of a value. The LAST_INSERT_ID() function returns the last auto-increment value allocated by the current session, even if other concurrent clients are also generating values in the same table or different tables.

  3. Use an external solution. Generate primary key values not using SQL but with some other system in your application. You're responsible for protecting against race conditions. For instance you could use a counting semaphore.

  4. Use a pseudorandom, unique id. Primary keys need to be unique, but they don't need to be monotonically increasing integers. Some people use the UUID() function to generate a random 128-bit number that's virtually guaranteed to not have duplicates. But then your primary keys have to use a larger data type such as CHAR(36) or BINARY(16) and it's inconvenient to write ad hoc queries.

    SELECT * FROM MyTable WHERE id = '6ccd780c-baba-1026-9564-0040f4311e29';

You mention in a comment that you "read some negative things" about using auto-increment. Of course any feature in any language has do's and don'ts. It doesn't mean we shouldn't use those features -- it means we should learn how to use them properly.

Can you describe your concerns or any of the negative things about auto-increment? Perhaps folks on this thread can address them.

Bill Karwin
Thanks a lot for the detailed explanation. I had read that in certain cases, there could be a large gap in the values generated by the server. Since I wished to have sequential data in my table, I thought of writing my own solution. But, having read the comments here, I believe using an auto-increment field is the best way.
@Bill - Which one would you suggest? Implementing it using Select ... For Update or using auto-increment?
Definitely I recommend using auto-increment. Locking can block concurrent readers (depending on your transaction isolation level).
Bill Karwin
I added another solution using the `UUID()` function, but I include it only for completeness. I still recommend using auto-increment.
Bill Karwin
Okay then. Thanks a lot!!!