views:

197

answers:

6

Hi

I have recently started a new job and noticed that all the SQL tables use the GUID data type for the primary key.

In my previous job we used integers (Auto-Increment) for the primary key and it was a lot more easier to work with in my opinion.

For example, say you had two related tables; Product and ProductType - I could easily cross check the 'ProductTypeID' column of both tables for a particular row to quickly map the data in my head because its easy to store the number (2,4,45 etc) as opposed to (E75B92A3-3299-4407-A913-C5CA196B3CAB).

The extra frustration comes from me wanting to understand how the tables are related, sadly there is no Database diagram :(

A lot of people say that GUID's are better because you can define the unique identifer in your C# code for example using NewID() without requiring SQL SERVER to do it - this also allows you to know provisionally what the ID will be.... but I've seen that it is possible to still retrieve the 'next auto-incremented integer' too.

A DBA contractor reported that our queries could be up to 30% faster if we used the Integer type instead of GUIDS...

Why does the GUID data type exist, what advantages does it really provide?... Even if its a choice by some professional there must be some good reasons as to why its implemented?

+4  A: 

GUIDs are good as identity fields in certain cases:

  • When you have multiple instances of SQL (different servers) and you need to combine the different updates later on without affecting referential integrity
  • Disconnected clients that create data - this way they can create data without worrying that the ID field is already taken

GUIDs are generated to be globally unique, which is why they are suited for such scenarios.

Oded
Ok, now that actually makes a lot of sense, because I have encountered times when I had to reinsert some data and the auto incremented column became a nightmare... with GUID this would not of been an issue... thanks :)
Dal
+2  A: 

It is globally unique, so that each record in your table has a GUID that is shared by no other item of any kind in the world. Handy if you need this kind of exclusive identification (if you are replicating the database, or combining data from multiple source). Otherwise, your dba is correct - GUIDs are much larger and less efficient that integers, and you could speed up your db (30%? maybe...)

Ray
Yea, I can see how when 'replicating data' this would be very advantageous and remove the restriction that an auto-incremented column would give. The database design isn't the best I've seen - I could believe a 30% improvement lol.
Dal
A: 

They basically save you from more sometimes complicated logic of using

set @InsertID = scope_identity() 
volody
+2  A: 

There are a ton of Google-able articles on using GUIDs as PKs and almost all of them say the same thing your DBA contractor says -- queries are faster without GUIDs as keys.

The primary use I've seen in practice (we've never used them as PKs) is with replication. The MSDN page for uniqueidentifier says about the same.

Austin Salonen
Thanks for the input, yep they are using them on the PK which is what makes it difficult to virtually navigate through rows.
Dal
@marc_s: oh god... typo. good catch
Austin Salonen
@Austin: yes, with replication, GUID might make sense - as primary key. But even then I'd try to avoid (like the plague) to make that GUID column my clustering key on the table!
marc_s
+3  A: 

INT

Advantage:

Numeric values (and specifically integers) are better for performance when used in joins, indexes and conditions. Numeric values are easier to understand for application users if they are displayed.

Disadvantage:

If your table is large, it is quite possible it will run out of it and after some numeric value there will be no additional identity to use.

GUID

Advantage:

Unique across the server.

Disadvantage:

String values are not as optimal as integer values for performance when used in joins, indexes and conditions. More storage space is required than INT.

credit goes to : http://blog.sqlauthority.com/2010/04/28/sql-server-guid-vs-int-your-opinion/

Pranay Rana
I doubt that may people will have more rows than a 4 byte int (2 billion) and even fewer will have more than the 8 byte BIGINT's limit of: 9,223,372,036,854,775,807. If they do, go negative and you can double the number of rows.
KM
The disadvantage of INT can easily be overcome by using BIGINT as neccesary. At least on Mysql this is a 64 bit integer, you'll never have that many records, and it's still only 50% of the size of a GUID. The only reason to use GUID is scalability. If you do want to go with GUID, most databases allow storing the GUID as binary instead of a hex string, which does save half of the storage space and speeds up indexing.
wump
BIG BIG disadvantage for GUID as clustering key in SQL Server: awfully bad performance, due to massive index and page fragmentation. Check out the articles mentioned in my answer - and think again. GUID's are convenient - but they're a DBA's hell come true....
marc_s
actually GUID stands for [G]lobal Unique Identififer, so they are not just unique to a server, but to a the Universe of servers. So you can merge eventually consistant datasets across a cluster very easily. You can't do this if every server has a record PK as an INT and the same INT represents different records.
fuzzy lollipop
+3  A: 

Contrary to what most folks here seem to preach, I see GUID's as more of a plague than a blessing. Here's why:

GUIDs may seem to be a natural choice for your primary key - and if you really must, you could probably argue to use it for the PRIMARY KEY of the table. What I'd strongly recommend not to do is use the GUID column as the clustering key, which SQL Server does by default, unless you specifically tell it not to.

You really need to keep two issues apart:

  1. the primary key is a logical construct - one of the candidate keys that uniquely and reliably identifies every row in your table. This can be anything, really - an INT, a GUID, a string - pick what makes most sense for your scenario.

  2. the clustering key (the column or columns that define the "clustered index" on the table) - this is a physical storage-related thing, and here, a small, stable, ever-increasing data type is your best pick - INT or BIGINT as your default option.

By default, the primary key on a SQL Server table is also used as the clustering key - but that doesn't need to be that way! I've personally seen massive performance gains when breaking up the previous GUID-based Primary / Clustered Key into two separate key - the primary (logical) key on the GUID, and the clustering (ordering) key on a separate INT IDENTITY(1,1) column.

As Kimberly Tripp - the Queen of Indexing - and others have stated a great many times - a GUID as the clustering key isn't optimal, since due to its randomness, it will lead to massive page and index fragmentation and to generally bad performance.

Yes, I know - there's newsequentialid() in SQL Server 2005 and up - but even that is not truly and fully sequential and thus also suffers from the same problems as the GUID - just a bit less prominently so. Plus, you can only use it as a default for a column in your table - you cannot get a new sequential GUID in T-SQL code (like a trigger or something) - another major drawback.

Then there's another issue to consider: the clustering key on a table will be added to each and every entry on each and every non-clustered index on your table as well - thus you really want to make sure it's as small as possible. Typically, an INT with 2+ billion rows should be sufficient for the vast majority of tables - and compared to a GUID as the clustering key, you can save yourself hundreds of megabytes of storage on disk and in server memory.

Quick calculation - using INT vs. GUID as Primary and Clustering Key:

  • Base Table with 1'000'000 rows (3.8 MB vs. 15.26 MB)
  • 6 nonclustered indexes (22.89 MB vs. 91.55 MB)

TOTAL: 25 MB vs. 106 MB - and that's just on a single table!

Some more food for thought - excellent stuff by Kimberly Tripp - read it, read it again, digest it! It's the SQL Server indexing gospel, really.

Marc

marc_s

related questions