ansaurus

Question

SQL GUID Vs Integer

Answer 1

+4 A:

GUIDs are good as identity fields in certain cases:

When you have multiple instances of SQL (different servers) and you need to combine the different updates later on without affecting referential integrity
Disconnected clients that create data - this way they can create data without worrying that the ID field is already taken

GUIDs are generated to be globally unique, which is why they are suited for such scenarios.

Oded 2010-05-10 17:39:29

Ok, now that actually makes a lot of sense, because I have encountered times when I had to reinsert some data and the auto incremented column became a nightmare... with GUID this would not of been an issue... thanks :)

Dal 2010-05-10 17:51:29

Answer 2

+2 A:

It is globally unique, so that each record in your table has a GUID that is shared by no other item of any kind in the world. Handy if you need this kind of exclusive identification (if you are replicating the database, or combining data from multiple source). Otherwise, your dba is correct - GUIDs are much larger and less efficient that integers, and you could speed up your db (30%? maybe...)

Ray 2010-05-10 17:40:20

Yea, I can see how when 'replicating data' this would be very advantageous and remove the restriction that an auto-incremented column would give. The database design isn't the best I've seen - I could believe a 30% improvement lol.

Dal 2010-05-10 17:52:55

Answer 3

A:

They basically save you from more sometimes complicated logic of using

set @InsertID = scope_identity()

volody 2010-05-10 17:41:09

Answer 4

+2 A:

There are a ton of Google-able articles on using GUIDs as PKs and almost all of them say the same thing your DBA contractor says -- queries are faster without GUIDs as keys.

The primary use I've seen in practice (we've never used them as PKs) is with replication. The MSDN page for uniqueidentifier says about the same.

Austin Salonen 2010-05-10 17:42:25

Thanks for the input, yep they are using them on the PK which is what makes it difficult to virtually navigate through rows.

Dal 2010-05-10 17:54:21

@marc_s: oh god... typo. good catch

Austin Salonen 2010-05-10 18:32:30

@Austin: yes, with replication, GUID might make sense - as primary key. But even then I'd try to avoid (like the plague) to make that GUID column my clustering key on the table!

marc_s 2010-05-10 18:44:56

Answer 5

+3 A:

INT

Advantage:

Numeric values (and specifically integers) are better for performance when used in joins, indexes and conditions. Numeric values are easier to understand for application users if they are displayed.

Disadvantage:

If your table is large, it is quite possible it will run out of it and after some numeric value there will be no additional identity to use.

GUID

Advantage:

Unique across the server.

Disadvantage:

String values are not as optimal as integer values for performance when used in joins, indexes and conditions. More storage space is required than INT.

credit goes to : http://blog.sqlauthority.com/2010/04/28/sql-server-guid-vs-int-your-opinion/

Pranay Rana 2010-05-10 17:42:40

I doubt that may people will have more rows than a 4 byte int (2 billion) and even fewer will have more than the 8 byte BIGINT's limit of: 9,223,372,036,854,775,807. If they do, go negative and you can double the number of rows.

KM 2010-05-10 17:48:24

The disadvantage of INT can easily be overcome by using BIGINT as neccesary. At least on Mysql this is a 64 bit integer, you'll never have that many records, and it's still only 50% of the size of a GUID. The only reason to use GUID is scalability. If you do want to go with GUID, most databases allow storing the GUID as binary instead of a hex string, which does save half of the storage space and speeds up indexing.

wump 2010-05-10 17:50:34

BIG BIG disadvantage for GUID as clustering key in SQL Server: awfully bad performance, due to massive index and page fragmentation. Check out the articles mentioned in my answer - and think again. GUID's are convenient - but they're a DBA's hell come true....

marc_s 2010-05-10 18:21:23

actually GUID stands for [G]lobal Unique Identififer, so they are not just unique to a server, but to a the Universe of servers. So you can merge eventually consistant datasets across a cluster very easily. You can't do this if every server has a record PK as an INT and the same INT represents different records.

fuzzy lollipop 2010-05-10 18:39:05

Answer 6

+3 A:

Contrary to what most folks here seem to preach, I see GUID's as more of a plague than a blessing. Here's why:

GUIDs may seem to be a natural choice for your primary key - and if you really must, you could probably argue to use it for the PRIMARY KEY of the table. What I'd strongly recommend not to do is use the GUID column as the clustering key, which SQL Server does by default, unless you specifically tell it not to.

You really need to keep two issues apart:

the primary key is a logical construct - one of the candidate keys that uniquely and reliably identifies every row in your table. This can be anything, really - an INT, a GUID, a string - pick what makes most sense for your scenario.
the clustering key (the column or columns that define the "clustered index" on the table) - this is a physical storage-related thing, and here, a small, stable, ever-increasing data type is your best pick - INT or BIGINT as your default option.

By default, the primary key on a SQL Server table is also used as the clustering key - but that doesn't need to be that way! I've personally seen massive performance gains when breaking up the previous GUID-based Primary / Clustered Key into two separate key - the primary (logical) key on the GUID, and the clustering (ordering) key on a separate INT IDENTITY(1,1) column.

As Kimberly Tripp - the Queen of Indexing - and others have stated a great many times - a GUID as the clustering key isn't optimal, since due to its randomness, it will lead to massive page and index fragmentation and to generally bad performance.

Yes, I know - there's newsequentialid() in SQL Server 2005 and up - but even that is not truly and fully sequential and thus also suffers from the same problems as the GUID - just a bit less prominently so. Plus, you can only use it as a default for a column in your table - you cannot get a new sequential GUID in T-SQL code (like a trigger or something) - another major drawback.

Then there's another issue to consider: the clustering key on a table will be added to each and every entry on each and every non-clustered index on your table as well - thus you really want to make sure it's as small as possible. Typically, an INT with 2+ billion rows should be sufficient for the vast majority of tables - and compared to a GUID as the clustering key, you can save yourself hundreds of megabytes of storage on disk and in server memory.

Quick calculation - using INT vs. GUID as Primary and Clustering Key:

Base Table with 1'000'000 rows (3.8 MB vs. 15.26 MB)
6 nonclustered indexes (22.89 MB vs. 91.55 MB)

TOTAL: 25 MB vs. 106 MB - and that's just on a single table!

Some more food for thought - excellent stuff by Kimberly Tripp - read it, read it again, digest it! It's the SQL Server indexing gospel, really.

Marc

marc_s 2010-05-10 18:19:39

ansaurus

tags:

views:

answers:

SQL GUID Vs Integer

related questions