views:

1782

answers:

15

I'm aware of the benefits of using a GUID, as well as the benefits of using and INT as a PK in a database. Considering that a GUID is in essence a 128 bit INT and a normal INT is 32 bit, the INT is a space saver (though this point is generally moot in most modern systems).

In the end, in what circumstances would you see yourself using an INT as a PK versus a GUID?

+1  A: 

An INT is certainly much easier to read when debugging, and much smaller.

I would, however, use a GUID or similar as a license key for a product. You know it's going to be unique, and you know that it's not going to be sequential.

Hooloovoo
A: 

Discussed gazillion times.

Anton Gogolev
You didn't click that link did you? It brings back one result; this one.
Nathan Ridley
This must be some new definition of the word "gazillion" of which I was previously unaware.
paxdiablo
Fixed search link: WMD editor added + signs which apparently have special meaning for Lucene
Anton Gogolev
@Pax: The definition of "gazillion" is moot, anyway. ;-) I would argue that this topic has been thoroughly covered on SO already. Even without looking at a search engine this is a pretty safe guess, I'd say.
Tomalak
@Tomalek, I'm pretty sure it should be more than one :-). I wasn't aware of the quantity of questions related to the topic (since it holds little interest) but I had to downvote someone who seemed to just be giving a wrong answer. In any case, since it was a simple data entry error, I've reversed my downvote and given an upvote by way of apology to @Anton.
paxdiablo
+5  A: 

Kimberley Tripp (SQLSkills.com) has an article on using GUID's as primary keys. She advices against it because over the unnecessary overhead.

Ronald Wildenberg
+2  A: 

When comparing values such as Primary to Foreign key relationship, the INT will be faster. If the tables are indexed properly and the tables are small, you might not see much of a slow down, but you'd have to try it to be sure. INTs are also easier to read, and communicate with other people. It's a lot simpler to say, "Can you look at record 1234?" instead of "Can you look at record 031E9502-E283-4F87-9049-CE0E5C76B658?"

Kevin
+1  A: 

Some OSes don't generate GUIDs anymore based on unique hardware features (CPUID,MAC) because it made tracing users to easy (privacy concerns). This means the GUID uniqueness is often no longer as universal as many people think.

If you use some auto-id function of your database, the database could in theory make absolutely sure that there is no duplication.

Marco van de Voort
GUIDs these days are usually randomly generated
Jim Arnold
A: 

I always think PK's should be numeric where possble. Dont forget having GUIDs as a PK will probably mean that they are also used in other tables as foriegn keys, so paging and index etc will be greater.

kevchadders
What if the natural key of the record is not numeric; e.g. (host, timestamp) for a log message record, or (product_code) for a product record? Would you insist on adding a numeric field serving no purpose except to have a redundant key?
bignose
No I wouldn't, but for a timestamp field you could consider adding a Identity Field to the table and use that as the key instead of the timestamp. As they are both generated by the DB.If its a product code then I would always use that for the ID as that is product specific based on your business so it makes no sence to change it to an ID.It all depends on the type of data you will be storing and how you will go about designing your database.
kevchadders
+1  A: 

I think the database also matters. From a MySQL perspective - generally, the smaller the datatype the faster the performance.

It seems to hold true for int vs GUID too - http://kccoder.com/mysql/uuid-vs-int-insert-performance/

Sam
A: 

I would use GUID as PK only if this key bounds to similar value. For example, user id (users in WinNT are describes with GUIDs), or user group id. Another one example. If you develop distributed system for documents management and different parts of system in different places all over the world can create some documents. In such case I would use GUID, because it guaranties that 2 documents created in different parts of distributed system wouldn't have same Id.

Dmitry Lobanov
+1  A: 

To answer your question: In the end, in what circumstances would you see yourself using an INT as a PK versus a GUID?

I would use a GUID if my system would have an online/offline version that inside the offline version you can save data and that data is transferred back to the server one day during a synch. That way, you are sure that you won't have the same key twice inside your database.

Nordes
A: 

If the data lives in a single database (as most data for the applications that we write in general does), then I use an IDENTITY. It's easy, intended to be used that way, doesn't fragment the clustered index and is more than enough. You'll run out of room at 2 billion some records (~ 4 billion if you use negative values), but you'd be toast anyway if you had that many records in one table, and then you have a data warehousing problem.

If the data lives in multiple, independent databases or interfaces with a third-party service, then I'll use the GUID that was likely already generated. A good example would be a UserProfiles table in the database that maps users in Active Directory to their user profiles in the application via their objectGUID that Active Directory assigned to them.

Nicholas Piasecki
A: 

the INT is a space saver (though this point is generally moot in most modern systems).

Not so. It may seem so at first glance, but note that the primary key of each table will be repeated multiple times throughout the database in indexes and as foreign key in other tables. And it will be involved in nearly any query containing its table - and very intensively when it's a foreign key used for a join.

Furthermore, remember that modern CPUs are very, very fast, but RAM speeds have not kept up. Cache behaviour becomes therefore increasingly important. And the best way to get good cache behaviour is to have smaller data sets. So the seemingly irrelevant difference between 4 and 16 bytes may well result in a noticeable difference in speed. Not necessarily always - but it's something to consider.

Michael Borgwardt
A: 

If you are planning on merging database at some stage, ie for a multi-site replication type setup, Guid's will save a lot of pain. But other than that I find Int's easier.

Craig
A: 

Apart from being a poor choice when you need to synchronize several database instances, INT's have one drawback I haven't seen mentioned: inserts always occur at one end of the index tree. This increases lock contention when you have a table with a lot of movement (since the same index pages have to be modified by concurrent inserts, whereas GUID's will be inserted all over the index). The index may also have to be rebalanced more often if a B* tree or similar data structure is used.

Of course, int's are easier on the eye when doing manual queries and report construction, and space consumption may add up through FK usages.

I'd be interested to see any measurements of how well e.g. SQL Server actually handles insert-heavy tables with IDENTITY PK's.

Pontus Gagge
A: 

We have Guids in our very complex enterprise software everywhere. Works smoothly.

I believe Guids are semantically more suitable to serve as identifiers. There is also no point in unnecessarily worrying about performance until you are faced with that problem. Beware premature optimization.

There is also an advantage with database migration of any sort. With Guids you will have no collisions. If you attempt to merge several DBs where ints are used for identity, you will have to replace their values. If these old values were used in urls, they will now be different following SEO hit.

User