ansaurus

Question

Difference between creating Guid keys in C# vs. the DB

Answer 1

+5 A:

I think you are doing just fine by creating them on the client side. As you mentioned, if you let the db do it, you have to find some way (can't think of any really) to get that key. If you were using an identity, there are calls you can use to get the latest one created for a table, but I'm not sure if such exists for a guid.

Otávio Décio 2009-01-30 18:44:44

The process for retrieving a just-added PK is the same in most DBs regardless of the key's type (i.e. int, CHAR, GUID etc.).

MusiGenesis 2009-01-30 19:42:21

I don't think you should be speaking for "most DBs". one of the most prolific, Oracle, doesn't work like that at all.

2009-01-30 19:45:47

@Mark, of course it does,look up the RETURNING INTO clause

Ricardo Villamil 2009-01-30 20:17:50

Answer 2

+2 A:

By doing it in C# you might run the risk of reassigning the GUID and saving it back to the database. By having the database be responsible for it, you're guaranteed that this PK will not change, that is, if you set up the proper constraints. Having said that, you could set similar constraints in your C# code that prevent changing a unique id once it has been assigned, but you'd have to do the same in all of your applications...In my opinion, having it in C# sounds like is more maintenance than the database, since databases already have built in methods to prevent changing primary keys.

Ricardo Villamil 2009-01-30 18:44:53

Not sure why you got modded down. I don't think it's unreasonable to think that you could have a bug in your code. I think some c#'ers got offended at the suggestion.

2009-01-30 19:57:33

Hm, sorry if it came that way, I'm a hardcore C#'er and dindn't intend to say that, I just meant to say that constraints and db code you write once and in one place and forget about it, versus doing it in several C# apps.

Ricardo Villamil 2009-01-30 20:08:29

I didn't downmod, but I don't think the first sentence is accurate. The likelyhood of the same GUID being generated by two different clients is extremely low.

Randolpho 2009-01-30 21:14:01

I don't think Ricardo's comment is about the likelihood of generating the same GUID in different C# calls but rather about the vagaries of application coding versus concentrating your indexing functionality in the database.

Joe Soul-bringer 2009-01-30 21:32:31

No, but the risk of assigning 2 different GUIDs to a single row is a possibility, which is how I read that first sentence.

bart 2009-01-30 21:32:36

@Joe: Now that I read it with that interpretation in mind, I agree.

Randolpho 2009-01-30 21:39:08

Answer 3

+1 A:

GUIDs are horrible for performance

I would leave it in the database especially now that SQL Server has NEWSEQUENTIALID() which doesn't cause page splits on inserts anymore because the values are random, every NEWSEQUENTIALID created will be greater than the previous one...only caviat is that it can only be used as a default value

SQLMenace 2009-01-30 18:45:16

That thing about page splits, I guess, is true iff the GUID is a clusted index. The GUID may be the PK, but there may be another (more natural) column that's more appropriate to use as the clusted index.

ChrisW 2009-01-30 18:48:15

maybe but by default a PK is clustered and most people don't even realize that :-(

SQLMenace 2009-01-30 18:49:54

See http://stackoverflow.com/questions/170346/what-are-the-performance-improvement-of-sequential-guid-over-standard-guid for creating sequential guids in C#.

Michael Meadows 2009-01-30 19:39:35

GUIDs are not "horrible for performance" - this is a severe overstatement. A SELECT query that JOINs on a GUID will be slower than a similar query that JOINs on an int (GUID joins are about as slow as CHAR or VARCHAR JOINs). GUIDs also take up a bit more space than an int, but this is trivial.

MusiGenesis 2009-01-30 19:40:10

"A bit more space" is a severe understatement.

2009-01-30 19:47:51

@MusiGenesis create a table and then insert million rows with a GIUD PK then look at the fragmentation level of your table

SQLMenace 2009-01-30 19:52:49

@SQLMenace based on the comments above though, isn't the ultimate reason for the fragmentation that you're clustering on the wrong column, rather than the use of Guids for PKs in and of itself?

John Price 2009-01-30 20:00:55

@Mark Brady: yes, it's 4 times the size of an int, so "a bit more space" can be considered deceptive. GUIDs are 128bit, while ints are 32 bit. But given the likely size of a particular row, that's a drop in the bucket.

Randolpho 2009-01-30 21:09:58

@SQLMenace: there's nothing wrong with a GUID PK provided, as has been mentioned, you don't cluster on it. There are significant benefits to a GUID PK when you work in disconnected environment where merging records is a major portion of your DB use.

Randolpho 2009-01-30 21:11:39

@Randolpho: you can have the GUID as a unique column and have an integer ID, which gets you best of both worlds: speedy comparisons/insertions by using the int ID, ability to merge databases by using the GUID.

Joe Pineda 2009-01-31 05:16:24

Answer 4

A:

I let an empty Guid be an indicator that this object, although constructed, has not yet been inserted into (or retrieved from) the database.

ChrisW 2009-01-30 18:46:44

I do disagree with this approach. How do you determine inserted objects that have properties changed. Objects that are about to be deleted. Guid.Empty is not the best approach here.

Ray Booysen 2009-01-30 18:50:57

When necessary, we use a separate enum field on the entity that's one of New/UnModified/Modified/Deleted to explicitly keep track of it's state in the database.

John Price 2009-01-30 18:54:02

@Ray aamof I do a select to get what's currently in the database: then, comparing what I have with what I've just selected tells me what's been edited or deleted.

ChrisW 2009-01-30 18:56:10

Answer 5

A:

If you ever have to do an insert outside of the GUI (think import from another vendor or data from a company you bought and have to merge with your data), then the GUID would not automatically be assigned. It's not an insurmountable issue, but it is something to consider nonetheless.

HLGEM 2009-01-30 21:05:46

Answer 6

+2 A:

Interesting question.

Traditionally I too used the DB assigned guid but recently I was working on a Windows Mobile application and the SQL CE database doesn't allow for newguid so I had to do it in code.

I use SQL replication to get the data from the mobile devices to the server. Over the last 6 months I have had 40 SQL CE clients synchronise back over 100000 records to a SQL 2005 server without one missed or duplicated guid.

The additional coding required was negligible and the benefit of knowing the guid before inserting has in fact cut down on some of the complexity.

I haven't done any performance checking so performance aside I cannot see any reason not to implement guid handling as you suggest.

MBoy 2009-01-30 21:07:26

Answer 7

A:

As SQLMenace noted, standard GUIDs negatively affects indexing & paging. In C# you can generate sequential GUIDs like NEWSEQUENTIALID() using a little P/Invoke fun.

[DllImport("rpcrt4.dll", SetLastError = true)]
static extern int UuidCreateSequential(out Guid guid);

This way you can at least keep using GUIDs, but get more flexibility with how and where they are generated.

Joe Doyle 2009-01-30 21:23:29

Answer 8

A:

Ok, time to chime in. I would say that generated GUIDs client-side for saving to the database is the best way to do things -- provided you happen to be using GUIDs as your PKs, which I only recommend in one scenario: disconnected environment.

When you are using a disconnected model for your data propagation (i.e. PDA/cellphone apps, laptop apps intended for limited connectivity scenarios, etc), GUIDs as PKs generated client-side are the best way to do it.

For every other scenario, you're probably better off with auto-increment identity PKs.

Why? Well, a couple reasons. First, you really do get a big performance boost by using a row-spanning clustered PK index. A GUID PK and a clustered index do not play well together -- even with NEWSEQUENTIALID, which, by the way, I think totally misses the point of GUIDs. Second, unless your situation forces you not to (i.e. you have to use a disconnected model) you really want to keep everything transactional and insert as much interrelated data together at the same time.

Randolpho 2009-01-30 21:52:18

Answer 9

A:

Aside from the clustering issue (which doesn't seem to be a problem if you set your indexes up properly),

GUID as indexes will always be terribly cluttered - there's no "proper" setup to avoid that (unless you use the NEWSEQUENTIALGUID function in the SQL Server engine).

The biggest drawback IMHO is size - a GUID is 16 byte, an INT is 4. The PK is not only stored in the tree of the primary key, but also ON EVERY non-clustered index entry.

With a few thousand entries, that might not make a big difference - but if you have a table with millions or billions of entries and several non-clustered indices, using a 16-byte GUID vs. a 4-byte INT as PK might make a HUGE difference in space needed - on disk and in RAM.

Marc

marc_s 2009-01-31 09:29:42

Again, though, my question wasn't "should we use GUIDs or not" (we've already had that discussion and went with GUIDs). Rather, I was curious whether there was any drawback to creating guids at the application layer rather than the DB layer.

John Price 2009-01-31 18:11:23

Yes, I understood that - but I felt you were missing an important point that strongly speaks against using GUIDs for your keys.

marc_s 2009-01-31 21:39:57

*IF* you really must stick with GUIDs, then yes, there's a BIG PLUS creating them on the database end - you can use the SQL Server 2005 "NEWSEQUENTIALGUID" feature and at least allieviate some of the drawbacks of fragmentation by creating sequential (e.g. ever-increasing) GUIDs.

marc_s 2009-01-31 21:41:13

If you're not clustering on your PK, I'm still not clear where the fragmentation problem is coming in. Presumably you should be clustering on a column that's sensible to cluster on, based on how you're going to use data in that table.

John Price 2009-02-01 00:36:54

Yes, if you have non-clustered PK, then using a GUID for a PK is less of a problem. But since the PK = clustering key is the default and used by the vast majority of folks, it usually is a problem.

marc_s 2009-02-01 10:28:05

ansaurus

tags:

views:

answers:

Difference between creating Guid keys in C# vs. the DB

related questions