views:

5317

answers:

10

I've worked on a number of database systems in the past where moving entries between databases would have been made a lot easier if all the database keys had been GUID / UUID values. I've considered going down this path a few times, but there's always a bit of uncertainty, especially around performance and un-read-out-over-the-phone-able URLs.

Has anyone worked extensively with GUIDs in a database? What advantages would I get by going that way, and what are the likely pitfalls?

+8  A: 

Jeff has a post about it "Primary Keys: IDs versus GUIDs".

John
+4  A: 

The main advantages are that you can create unique id's without connecting to the database. And id's are globally unique so you can easilly combine data from different databases. These seem like small advantages but have saved me a lot of work in the past.

The main disadvantages are a bit more storage needed (not a problem on modern systems) and the id's are not really human readable. This can be a problem when debugging.

There are some performance problems like index fragmentation. But those are easilly solvable (comb guids by jimmy nillson: http://www.informit.com/articles/article.aspx?p=25862 )

Edit merged my two answers to this question

@Matt Sheppard I think he means that you can duplicate rows with different GUIDs as primary keys. This is an issue with any kind of surrogate key, not just GUIDs. And like he said it is easilly solved by adding meaningfull unique constraints to non-key columns. The alternative is to use a natural key and those have real problems..

Mendelt
+1  A: 

GUIDs may cause you a lot of trouble in the future if they are used as "uniqifiers", letting duplicated data get into your tables. If you want to use GUIDs, please consider still maintaining UNIQUE-constraints on other column(s).

Troels Arvin
-1, how the hell would a GUID be duplicated?
jcollum
This is the heart of the problem: Introducing a GUID makes any row unique. But the non-artificial parts of the rows may suddenly contain duplicates (several versions of the truth).
Troels Arvin
+1 to compensate. I see what you mean, but it's badly expressed.
Stefano Borini
A: 

@Troels Arvin - Can you please elaborate? How would I end up with duplicated data?

Matt Sheppard
+4  A: 

@Matt Sheppard:

Say you have a table of customers. Surely you don't want a customer to exist in the table more than once, or lots of confusion will happen throughout your sales and logistics departments (especially if the multiple rows about the customer contain different information).

So you have a customer identifier which uniquely identifies the customer and you make sure that the identifier is known by the customer (in invoices), so that the customer and the customer service people have a common reference in case they need to communicate. To guarantee no duplicated customer records, you add a uniqueness-constraint to the table, either through a primary key on the customer identifier or via a NOT NULL + UNIQUE constraint on the customer identifier column.

Next, for some reason (which I can't think of), you are asked to add a GUID column to the customer table and make that the primary key. If the customer identifier column is now left without a uniqueness-guarantee, you are asking for future trouble throughout the organization because the GUIDs will always be unique.

Some "architect" might tell you that "oh, but we handle the real customer uniqueness constraint in our app tier!". Right. Fashion regarding that general purpose programming languages and (especially) middle tier frameworks changes all the time, and will generally never out-live your database. And there is a very good chance that you will at some point need to access the database without going through the present application. == Trouble. (But fortunately, you and the "architect" are long gone, so you will not be there to clean up the mess.) In other words: Do maintain obvious constraints in the database (and in other tiers, as well, if you have the time).

In other words: There may be good reasons to add GUID columns to tables, but please don't fall for the temptation to make that lower your ambitions for consistency within the real (==non-GUID) information.

Troels Arvin
Hear hear!Love your SQL comparison page btw. Extremely useful. The only thing I miss is a changelog.
Henrik Gustafsson
+41  A: 

Advantages:

  • Can generate them offline.
  • Makes replication trivial (as opposed to int's, which makes is REALLY hard)
  • ORM's usually like them
  • unique accross applications. So We can use the PK's from our CMS (guid) in our app (also guid) and know we are NEVER going to get a clash.

Disadvantages:

  • Larger space use, but space is cheap(er)
  • can't order by ID to get the insert order.
  • Can look ugly in a URL, but really, WTF are you doing putting a REAL DB key in a URL!?
  • Harder to do manual debugging, but not that hard

Personally, I use them for most PK's in any system of a decent size, but I got "trained" on a system which was replicated all over the place, so we HAD to have them. YMMV.

I think the duplicate data thing is rubbish - you can get duplicate data however you do it. Surrogate keys are usually frowned upon whereever I've been working. We DO use the wordpress-like system tho:

  • unique ID for the row (GUID/whatever). Never visible to the user.
  • public ID is generated ONCE from some field (eg the title - make it the-title-of-the-article)

Works for us :)

Nic Wise
[WTF are you doing putting a REAL DB key in a URL!?] Not sure why that bothers you. What else would you use? Look at Stack Overflow... It has IDENTITY values in the URL all over the place, and it works just fine. Using DB keys in URLs doesn't prevent you from enforcing security.
Euro Micelli
No, it doesn't, but things like SEO are usually better if there isn't a key in it - especially something as long as a GUID. Of course, it can be worked around easily, so I gues that was a bit of an over sweeping statement
Nic Wise
+8  A: 

This was also discussed here on Stack Overflow, and this is an excellent article on the web that lists the pros and cons.

Eric Z Beard
+1 for doing research
jcollum
+3  A: 

Why doesn't anyone mention performance? When you have multiple joins, all based on these nasty GUIDs the performance will go through the floor, been there :(

Andrei Rinea
+1  A: 

One other small issue to consider with using GUIDS as primary keys if you are also using that column as a clustered index (a relatively common practice). You are going to take a hit on insert because of the nature of a guid not begin sequential in anyway, thus their will be page splits, etc when you insert. Just something to consider if the system is going to have high IO...

WIDBA
A: 

stack array overflow advantages?

abdul salam