tags:

views:

1270

answers:

9

We use Guids as primary keys for entities in the database. Traditionally, we've followed a pattern of letting the database set the ID for an entity during the INSERT, I think mostly because this is typically how you'd handle things using an auto-increment field or whatever.

I'm finding more and more that it's a lot handier to do key assignment in code during object construction, for two main reasons:

  1. you know that once an object's constructor has run, all of it's fields have been initialized. You never have "half-baked" objects kicking around.
  2. if you need to do a batch of operations, some of which depend on knowing an object's key, you can do them all at once without round-tripping to the database.

Are there any compelling reasons not to do things this way? That is, when using Guids as keys, is there a good reason to leave key assignment up to the database?

Edit: A lot of people have strong opinions on whether or not Guids should be used for PKs (which I knew), but that wasn't really the point of my question.

Aside from the clustering issue (which doesn't seem to be a problem if you set your indexes up properly), I haven't seen a compelling reason to avoid creating keys in the application layer.

+5  A: 

I think you are doing just fine by creating them on the client side. As you mentioned, if you let the db do it, you have to find some way (can't think of any really) to get that key. If you were using an identity, there are calls you can use to get the latest one created for a table, but I'm not sure if such exists for a guid.

Otávio Décio
The process for retrieving a just-added PK is the same in most DBs regardless of the key's type (i.e. int, CHAR, GUID etc.).
MusiGenesis
I don't think you should be speaking for "most DBs". one of the most prolific, Oracle, doesn't work like that at all.
@Mark, of course it does,look up the RETURNING INTO clause
Ricardo Villamil
+2  A: 

By doing it in C# you might run the risk of reassigning the GUID and saving it back to the database. By having the database be responsible for it, you're guaranteed that this PK will not change, that is, if you set up the proper constraints. Having said that, you could set similar constraints in your C# code that prevent changing a unique id once it has been assigned, but you'd have to do the same in all of your applications...In my opinion, having it in C# sounds like is more maintenance than the database, since databases already have built in methods to prevent changing primary keys.

Ricardo Villamil
Not sure why you got modded down. I don't think it's unreasonable to think that you could have a bug in your code. I think some c#'ers got offended at the suggestion.
Hm, sorry if it came that way, I'm a hardcore C#'er and dindn't intend to say that, I just meant to say that constraints and db code you write once and in one place and forget about it, versus doing it in several C# apps.
Ricardo Villamil
I didn't downmod, but I don't think the first sentence is accurate. The likelyhood of the same GUID being generated by two different clients is extremely low.
Randolpho
I don't think Ricardo's comment is about the likelihood of generating the same GUID in different C# calls but rather about the vagaries of application coding versus concentrating your indexing functionality in the database.
Joe Soul-bringer
No, but the risk of assigning 2 different GUIDs to a single row is a possibility, which is how I read that first sentence.
bart
@Joe: Now that I read it with that interpretation in mind, I agree.
Randolpho
+1  A: 

GUIDs are horrible for performance

I would leave it in the database especially now that SQL Server has NEWSEQUENTIALID() which doesn't cause page splits on inserts anymore because the values are random, every NEWSEQUENTIALID created will be greater than the previous one...only caviat is that it can only be used as a default value

SQLMenace
That thing about page splits, I guess, is true iff the GUID is a clusted index. The GUID may be the PK, but there may be another (more natural) column that's more appropriate to use as the clusted index.
ChrisW
maybe but by default a PK is clustered and most people don't even realize that :-(
SQLMenace
See http://stackoverflow.com/questions/170346/what-are-the-performance-improvement-of-sequential-guid-over-standard-guid for creating sequential guids in C#.
Michael Meadows
GUIDs are not "horrible for performance" - this is a severe overstatement. A SELECT query that JOINs on a GUID will be slower than a similar query that JOINs on an int (GUID joins are about as slow as CHAR or VARCHAR JOINs). GUIDs also take up a bit more space than an int, but this is trivial.
MusiGenesis
"A bit more space" is a severe understatement.
@MusiGenesis create a table and then insert million rows with a GIUD PK then look at the fragmentation level of your table
SQLMenace
@SQLMenace based on the comments above though, isn't the ultimate reason for the fragmentation that you're clustering on the wrong column, rather than the use of Guids for PKs in and of itself?
John Price
@Mark Brady: yes, it's 4 times the size of an int, so "a bit more space" can be considered deceptive. GUIDs are 128bit, while ints are 32 bit. But given the likely size of a particular row, that's a drop in the bucket.
Randolpho
@SQLMenace: there's nothing wrong with a GUID PK provided, as has been mentioned, you don't cluster on it. There are significant benefits to a GUID PK when you work in disconnected environment where merging records is a major portion of your DB use.
Randolpho
@Randolpho: you can have the GUID as a unique column and have an integer ID, which gets you best of both worlds: speedy comparisons/insertions by using the int ID, ability to merge databases by using the GUID.
Joe Pineda
A: 

I let an empty Guid be an indicator that this object, although constructed, has not yet been inserted into (or retrieved from) the database.

ChrisW
I do disagree with this approach. How do you determine inserted objects that have properties changed. Objects that are about to be deleted. Guid.Empty is not the best approach here.
Ray Booysen
When necessary, we use a separate enum field on the entity that's one of New/UnModified/Modified/Deleted to explicitly keep track of it's state in the database.
John Price
@Ray aamof I do a select to get what's currently in the database: then, comparing what I have with what I've just selected tells me what's been edited or deleted.
ChrisW
A: 

If you ever have to do an insert outside of the GUI (think import from another vendor or data from a company you bought and have to merge with your data), then the GUID would not automatically be assigned. It's not an insurmountable issue, but it is something to consider nonetheless.

HLGEM
+2  A: 

Interesting question.

Traditionally I too used the DB assigned guid but recently I was working on a Windows Mobile application and the SQL CE database doesn't allow for newguid so I had to do it in code.

I use SQL replication to get the data from the mobile devices to the server. Over the last 6 months I have had 40 SQL CE clients synchronise back over 100000 records to a SQL 2005 server without one missed or duplicated guid.

The additional coding required was negligible and the benefit of knowing the guid before inserting has in fact cut down on some of the complexity.

I haven't done any performance checking so performance aside I cannot see any reason not to implement guid handling as you suggest.

MBoy
A: 

As SQLMenace noted, standard GUIDs negatively affects indexing & paging. In C# you can generate sequential GUIDs like NEWSEQUENTIALID() using a little P/Invoke fun.

[DllImport("rpcrt4.dll", SetLastError = true)]
static extern int UuidCreateSequential(out Guid guid);

This way you can at least keep using GUIDs, but get more flexibility with how and where they are generated.

Joe Doyle
A: 

Ok, time to chime in. I would say that generated GUIDs client-side for saving to the database is the best way to do things -- provided you happen to be using GUIDs as your PKs, which I only recommend in one scenario: disconnected environment.

When you are using a disconnected model for your data propagation (i.e. PDA/cellphone apps, laptop apps intended for limited connectivity scenarios, etc), GUIDs as PKs generated client-side are the best way to do it.

For every other scenario, you're probably better off with auto-increment identity PKs.

Why? Well, a couple reasons. First, you really do get a big performance boost by using a row-spanning clustered PK index. A GUID PK and a clustered index do not play well together -- even with NEWSEQUENTIALID, which, by the way, I think totally misses the point of GUIDs. Second, unless your situation forces you not to (i.e. you have to use a disconnected model) you really want to keep everything transactional and insert as much interrelated data together at the same time.

Randolpho
A: 

Aside from the clustering issue (which doesn't seem to be a problem if you set your indexes up properly),

GUID as indexes will always be terribly cluttered - there's no "proper" setup to avoid that (unless you use the NEWSEQUENTIALGUID function in the SQL Server engine).

The biggest drawback IMHO is size - a GUID is 16 byte, an INT is 4. The PK is not only stored in the tree of the primary key, but also ON EVERY non-clustered index entry.

With a few thousand entries, that might not make a big difference - but if you have a table with millions or billions of entries and several non-clustered indices, using a 16-byte GUID vs. a 4-byte INT as PK might make a HUGE difference in space needed - on disk and in RAM.

Marc

marc_s
Again, though, my question wasn't "should we use GUIDs or not" (we've already had that discussion and went with GUIDs). Rather, I was curious whether there was any drawback to creating guids at the application layer rather than the DB layer.
John Price
Yes, I understood that - but I felt you were missing an important point that strongly speaks against using GUIDs for your keys.
marc_s
*IF* you really must stick with GUIDs, then yes, there's a BIG PLUS creating them on the database end - you can use the SQL Server 2005 "NEWSEQUENTIALGUID" feature and at least allieviate some of the drawbacks of fragmentation by creating sequential (e.g. ever-increasing) GUIDs.
marc_s
If you're not clustering on your PK, I'm still not clear where the fragmentation problem is coming in. Presumably you should be clustering on a column that's sensible to cluster on, based on how you're going to use data in that table.
John Price
Yes, if you have non-clustered PK, then using a GUID for a PK is less of a problem. But since the PK = clustering key is the default and used by the vast majority of folks, it usually is a problem.
marc_s