views:

229

answers:

4

Hi , I have read a lot of articles about whether we should have primary keys that are identity columns, but I'm still confused.

There are advantages of making columns are identity as it would give better performance in joins and provides data consistency. But there is a major drawback associated with identity ,i.e.When INSERT statement fails, still the IDENTITY value increases If a transaction is rolled back, the new IDENTITY column value isn't rolled back, so we end up with gaps in sequencing. I can use GUIDs (by using NEWSEQUENTIALID) but it reduces performance.

+4  A: 

Gaps should not matter: the identity column is internal and not for end user usage or recognition.

GUIDs will kill performance, even sequential ones, because of the 16 byte width.

An identity column should be chosen to respect the physical implementation after modelling your data and working out what your natural keys are. That is, the chosen natural key is the logical key but you choose a surrogate key (identity) because you know how the engine works.

Or you use an ORM and let the client tail wag the database dog...

gbn
I wouldn't say, that Guids *kill* performance, as i wrote, we use Guids a lot as PK on a big near realtime system without having performance problems in joining vai Guid columns. But of cause the Guid type is larger than an integer.
Jan
@Jan: It bloats all your non-clustered indexes because the refer to the clustered key + it causes fragmentation. I use GUID myself, but never as a clustered PK
gbn
@Jan, the problem with GUIDs is that there is no meaningful way to order them.. Therefore an index on a GUID column is nowhere near as useful as an Index on an integer column.
Ash
@Jan: yes, it **KILLS** performance - totally. I went from a legacy system with GUID as PK/CK and 99.5% index fragmentation on a daily basis to using INT IDENTITY - **HUGE** difference. Hardly any index fragmentation anymore, performance is significantly better. GUIDs as Clustering Index on your SQL Server table are BAD BAD BAD - period.
marc_s
@Ash i don't no what you want to say with "there is no meaningful way to order [Guids]" I agree, that you will cause fragmentation when inserting with newid(). but its definitly possible to order them and gain reasonable join performance with indexes on guid-columns.@gbn yes, i wrote, that you will have increased table/index sizes. We use Guids as PK but not clustered. We rebuild indexes once a week and but we definitly havn't 99% fragmentation.Out main reason for doing this is, that we have to insert data on many merge replicated sql servers independently.
Jan
@Jan: OK, you're not using clustered PKs which is not the default. Perhaps you should mentioned this in your answer. (Hope you have some clustered index though).
gbn
@gbn: yes, you are right - of cause we use clustered indexes. But i has to admit that i thougt of clustered keys just as the *sort order* of the data table itself before our discussion here. I thougt the lookup of non clustered indexes is done via th pk not the clustered key.
Jan
@Jan: NC indexes point to C index. http://msdn.microsoft.com/en-us/library/ms177484.aspx "Each index row in the nonclustered index contains the nonclustered key value and a row locator. This locator points to the data row in the clustered index or heap having the key value."
gbn
+1  A: 

I would like to say that depends on your needs. We use only Guids as primary keys (with default set to NewID) because we develop a distributed system with many Sql Server instances, so we have to be sure that every Sql Server generate unique primary key values. But when using a Guid column as PK, be sure not to use it as your clustered index (thanks to marc_s for the link)

Advantage of the Guid type:

  • You can create unique values on different locations without synchronization

Disadvantage:

  • Its a large datatype (16 Bytes) and needs significant more space
  • It creates index fragmentation (at least when using the newid() function)

Dataconsistency is not an issue with primary keys independent of the datatype because a primary key has to be unique by definition!

I don't believe that an identity column has better join performance. At all, performance is a matter of the right indexes. A primary key is a constraint not an index.

Is your need to have a primary key of typ int with no gaps? This should'nt be a problem normally.

Jan
I just hope you're not using your GUID's as clustering key on the tables!! That just calls for bad performance and massive index fragmentation - not a good thing! See http://www.sqlskills.com/BLOGS/KIMBERLY/post/GUIDs-as-PRIMARY-KEYs-andor-the-clustering-key.aspx
marc_s
No, not clustered - just pk
Jan
+2  A: 

For all practical purposes, integers are ideal for primary keys and auto increment is a perfect way to generate them. As long as your PK is meaningless (surrogate) it will be protected from creativity of you customers and serve its main purpose (to identify a row in a table) just fine. Indexes are packed, joins fast as it gets, and it is easy to partition tables.
If you happen to need GUID, that's fine too; however, think auto-increment integer first.

Damir Sudarevic
A: 

"yes, it KILLS performance - totally. I went from a legacy system with GUID as PK/CK and 99.5% index fragmentation on a daily basis to using INT IDENTITY - HUGE difference. Hardly any index fragmentation anymore, performance is significantly better. GUIDs as Clustering Index on your SQL Server table are BAD BAD BAD - period."

Might be true, but I see no logical reasoning according to which this leads me to conclude that GUIDs PER SE are also BAD BAD BAD.

Maybe you should consider using other types of indexes on such data. And if your dbms does not offer you a choice between several types of index, then perhaps you should consider getting yourself a better dbms.

Erwin Smout