views:

177

answers:

6

Do you ever use a separate table for "generating" artificial primary keys for DB (and why)? What I mean is to have a table with two columns, table name and current ID - with which you could get new "ID" for some table by simply locking the row with that table name, getting the current value of the key, increment it by one, and unlock the row. Why would you prefer this over standard integer identity column?

P.S. The "idea" is from Fowlers Patterns of Enterprise Application Architecture, btw...

+3  A: 

The only time I have ever used this is when I had an application in BTrieve, and it didn't have an identity column. And I should also say when they tried to use this table, it caused a massive slow down when they tried to import data, because of all the extra reads and writes. My friend looked at it and rewrote how they did it to speed it up, but the moral of the story is that if you do something like this incorrectly, there can be brutal consequences.

Personally, I don't think I would ever want to do this. There is too much possibility for error. Two people try and use the same key, because they forgot to lock the table before grabbing the id. This just seems like something that should be left up to the RDBMS if at all possible. As Will brought up, it's easy to minimize this situation, but if you don't know what you are doing it can happen.

Kevin
If you have a function getting the ID and incrementing the value in this table using a transaction with a row isolation level, then such situations will become almost impossible.
Will Marcouiller
@Will - yes... **if** you do it right, and it's easy to get wrong. And it creates a point of contention in the app that can hurt performance.
Joel Coehoorn
+5  A: 

Whenever you can use sql server's identity or guid features, you should. However, there are a few situations where this may not be possible.

One example is that sql server only allows one identity column per table. Rarely, a table will have records that need both a private id and a public id, and a limit of one identity column means generating both as integers can be a pain. You could always use a guid for one, but you want the integer on the private id for speed and you may also want the public id to be more human readable than a guid.

In this situation, an extra table for generating the ids can make sense. However, I'd do it a bit differently. Still have two columns in the table, but make one "shadow" or "Id mapping" table for every real table. One of the columns will be your private id (unique constraint) and one will be your public id (identity with maybe an increment value of '7' or '13' or other number that's less obvious than '1').

The key difference here is that you don't want to do the locking yourself. Let sql server handle it.

Joel Coehoorn
+7  A: 

This is called Hi/Lo assignment.

You would do this having either a trigger on INSERT on your tables getting the ID from this table and incrementing it before or after you get your ID, depending of your choice.

This is commonly used when you have to deal with multiple database engines. The autoincremental identifier in Oracle is through a SEQUENCE, which you increment with SEQUENCE.NEXTVALUE from within a BEFORE INSERT TRIGGER on your data table.

Oppositly, SQL Server has IDENTITY columns, autoincrementing natively and this is managed by the DBE itself.

In order for your software to work on both DBE, you have to come to some sort of a standard, then the most common "standard" used for this is the Hi/Lo assignment to the primary key.

This is one approach amongst others. These days, with ORM Mapping tools such as NHibernate, it is offered through configuration so that you need less to care on both the application and the database sides.

EDIT #1

Because this kind of maneuvre can't be used for a global scope, you'd have to have such a table per database, or database schema. This way, each schema is indenpendant from the other. However, data in one schema can't implicitly be moved toward another with the same key, as it would perhaps be conflicted with an already existing row.

As for a security schema, it accesses the same database as another schema or user, so no additional table should exist for specific security schema.

Will Marcouiller
+1 that's really nice explanation.
Kevin
lack of native auto-increment is my single biggest complaint about oracle.
Joel Coehoorn
Working with Oracle Sequences can be fun too! =) It is different only, but serves the exact same purpose. But I agree, when I first run into Oracle two years ago, it took me some time to get used to this and other different features. However, PL/SQL offers many interesting features that TSQL would rather take from too. So, both have weaknesses and strong points. =)
Will Marcouiller
wow never knew about Hi/Lo assignment....nice explanation learnt something new. thanks :-)
Raja
+1 I've used that myself and yes, it's to support 4 different DBMS, nicely done :)
Jaya Wijaya
+1  A: 

I have seen this pattern used when data created in one database needs to be migrated, backed-up, clustered or staged to another database. In this situation, first of all your want to ensure the primary keys will not need to change. Secondly the foreign keys. Thirdly, externally exposed keys or durable references.

Jennifer Zouak
+1  A: 

You wouldn't prefer it at all.

Whatever you gain by using the pattern or becoming DB agnostic, you'll lose in headaches, support and performance.

gbn
+1  A: 

locking the row with that table name, getting the current value of the key, increment it by one, and unlock the row

This sounds simple, doesn't it?

   UPDATE TableOfId
     SET Id += 1
   OUTPUT Inserted.Id
   WHERE Name = @Name;

In reality, its a disaster. No activity occurs in the application as a standalone operation: all operations are part of transactions. One cannot simply 'unlock' the row because the 'unlock' will actually occur only at commit time. Which means that all transactions that need an Id on a table are serialized and only one can proceed at any time. It also means that transaction that access more than one table will likely deadlock on updating the table of Ids because enforcing the 'get the next Id' update order is hard in practice.

To avoid complete serialization one needs to obtain the Ids on separate, standalone, transactions that can commit immediately (usually implicit auto-commit transaction on the UPDATE itself). But this complicates the application logic tremendously. Every operation needs to maintain two separate connections to the database, one to do the normal transaction logic and another one to obtain the needed Ids. Even then, the update of Ids can become such a hot spot that it can still cause visible contention and blocking (similar to the dreaded 'update page hit count +1' prevalent on web apps).

In short: use IDENTITY. The identity generation is optimized for high concurrency.

Remus Rusanu