views:

2900

answers:

16

I'm currently working on someone else's database where the primary keys are generated via a lookup table which contains a list of table names and the last primary key used. A stored procedure increments this value and checks it is unique before returning it to the calling 'insert' SP.

What are the benefits for using a method like this (or just generating a GUID) instead of just using the Identity/Auto-number?

I'm not talking about primary keys that actually 'mean' something like ISBNs or product codes, just the unique identifiers.

Thanks.

+15  A: 

An auto generated ID can cause problems in situations where you are using replication (as I'm sure the techniques you've found can!). In these cases, I generally opt for a GUID.

If you are not likely to use replication, then an auto-incrementing PK will most likely work just fine.

Galwegian
+6  A: 

The procedure method of incrementing must be thread safe. If not, you may not get unique numbers. Also, it must be fast, otherwise it will be an application bottleneck. The built in functions have already taken these two factors into account.

dacracot
+2  A: 

One benefit is that it can allow the database/SQL to be more cross-platform. The SQL can be exactly the same on SQL Server, Oracle, etc...

Corey Trager
+5  A: 

It's useful for clients to be able to pre-allocate a whole bunch of IDs to do a bulk insert without having to then update their local objects with the inserted IDs. Then there's the whole replication issue, as mentioned by Galwegian.

Jon Skeet
+3  A: 

Using a unique identifiers would allow you to merge data from two different databases.

Maybe you have an application that collects data in multiple database and then "syncs" with a master database at various times in the day. You wouldn't have to worry about primary key collisions in this scenario.

Or, possibly, you might want to know what a record's ID will be before you actually create it.

bobwienholt
+1  A: 

The only reason I can think of is that the code was written before sequences were invented and the code forgot to catch up ;)

shyam
Right. The database for one of our oldest apps had the home-grown approach using a sequence table (basically what is described) because MySQL didn't support them (or possibly it was buggy) at the time. The tables still exist, but we have been using the real thing for years now.
MBCook
+1  A: 

I would prefer to use a GUID for most of the scenarios in which the post's current method makes any sense to me (replication being a possible one). If replication was the issue, such a stored procedure would have to be aware of the other server which would have to be linked to ensure key uniqueness, which would make it very brittle and probably a poor way of doing this.
One situation where I use integer primary keys that are NOT auto-incrementing identities is the case of rarely-changed lookup tables that enforce foreign key constraints, that will have a corresponding enum in the data-consuming application. In that scenario, I want to ensure the enum mapping will be correct between development and deployment, especially if there will be multiple prod servers.

Grank
A: 

The only real reason to do this is to be database agnostic (if different db versions use different auto-numbering techniques).

The other issue mentioned here is the ability to create records in multiple places (like in the central office as well as on traveling users' laptops). In that case, though, you would probably need something like a "sitecode" that was unique to each install that was prefixed to each ID.

BradC
A: 

Another potential reason is that you deliberately want random keys. This can be desirable if, say, you don't want nosey browsers leafing through every item you have in the database, but it's not critical enough to warrant actual authentication security measures.

bobince
+13  A: 

There's nothing inherently wrong with using AutoNumber, but there are a few reasons not to do it. Still, rolling your own solution isn't the best idea, as dacracot mentioned. Let me explain.

The first reason not to use AutoNumber on each table is you may end up merging records from multiple tables. Say you have a Sales Order table and some other kind of order table, and you decide to pull out some common data and use multiple table inheritance. It's nice to have primary keys that are globally unique. This is similar to what bobwienholt said about merging databases, but it can happen within a database.

Second, other databases don't use this paradigm, and other paradigms such as Oracle's sequences are way better. Fortunately, it's possible to mimic Oracle sequences using SQL Server. One way to do this is to create a single AutoNumber table for your entire database, called MainSequence, or whatever. No other table in the database will use autonumber, but anyone that needs a primary key generated automatically will use MainSequence to get it. This way, you get all of the built in performance, locking, thread-safety, etc. that dacracot was talking about without having to build it yourself.

Another option is using GUIDs for primary keys, but I don't recommend that because even if you are sure a human (even a developer) is never going to read them, someone probably will, and it's hard. And more importantly, things implicitly cast to ints very easily in T-SQL but can have a lot of trouble implicitly casting to a GUID. Basically, they are inconvenient.

In building a new system, I'd recommend using a dedicated table for primary key generation (just like Oracle sequences). For an existing database, I wouldn't go out of my way to change it.

Greg
+8  A: 

from CodingHorror:

GUID Pros

  • Unique across every table, every database, every server
  • Allows easy merging of records from different databases
  • Allows easy distribution of databases across multiple servers
  • You can generate IDs anywhere, instead of having to roundtrip to the database
  • Most replication scenarios require GUID columns anyway

GUID Cons

  • It is a whopping 4 times larger than the traditional 4-byte index value; this can have serious performance and storage implications if you're not careful
  • Cumbersome to debug (where userid='{BAE7DF4-DDF-3RG-5TY3E3RF456AS10}')
  • The generated GUIDs should be partially sequential for best performance (eg, newsequentialid() on SQL 2005) and to enable use of clustered indexes

The article provides a lot of good external links on making the decision on GUID vs. Auto Increment. If I can, I go with GUID.

andyh_ky
A: 

A useful side benefit of using a GUID primary key instead of an auto-incrementing one is that you can assign the PK value for a new row on the client side (in fact you have to do this in a replication scenario), sparing you the hassle of retrieving the PK of the row you just added on the server.

One of the downsides of a GUID PK is that joins on a GUID field are slower (unless this has changed recently). Another upside of using GUIDs is that it's fun to try and explain to a non-technical manager why a GUID collision is rather unlikely.

MusiGenesis
A: 

My main issue with auto-incrementing keys is that they lack any meaning.

For tables where certain fields provide uniqueness (whether alone or in combination with another), I'd opt for using that instead.

Kevin Fairchild
I believe every table should have a "meaningful" key, but unless it is a single attribute integral value, (seldom the case), it is just too much of a performance hit to use for FKs, and relational joins... So I always try to put two unique keys on every table, an autoNum key and a meaningful key.
Charles Bretana
I don't think it's always good for PKs to have meaning. If they do have meaning, you're relying on the fact that the business rules won't change in such a way that allows there to be duplicate values. With "meaningless" PKs, this is not a concern.
recursive
...hence the stipulation of "where certain fields provide uniqueness". I don't believe it's a one-or-the-other type of thing.
Kevin Fairchild
I've seen business rules change until five different tables had the same 7-column key. Your data is a city by the sea, and reality is always a hurricane just over the horizon. Surrogate keys are the sandbags that help you survive the flash floods while you shore up the rest of your structures.
WCWedin
+1  A: 

My main issue with auto-incrementing keys is that they lack any meaning

That's a requirement of a primary key, in my mind -- to have no other reason to exist other than identifying a record. If it has no real-world meaning, then it has no real-world reason to change. You don't want primary keys to change, generally speaking, because you have to search-replace your whole database or worse. I have been surprised at the sorts of things I have assumed would be unique and unchanging that have not turned out to be years later.

A: 

Galwegian's answer is not necessarily true.

With MySQL you can set a key offset for each database instance. If you combine this with a large enough increment it will for fine. I'm sure other vendors would have some sort of similar settings.

Lets say we have 2 databases we want to replicate. We can set it up in the following way.

increment = 2
db1 - offset = 1
db2 - offset = 2

This means that

db1 will have keys 1, 3, 5, 7....

db2 will have keys 2, 4, 6, 8....

Therefore we will not have key clashes on inserts.

Alex
A: 

Here's the thing with auto incrementing integers as keys:

You HAVE to have posted the record before you get access to it. That means that until you have posted the record, you cannot, for example, prepare related records that will be stored in another table, or any one of a lot of other possible reasons why it might be useful to have access to the new record's unique id, before posting it.

The above is my deciding factor, whether to go with one method, or the other.