views:

2667

answers:

6

I'm in the process of designing a fairly complex system. One of our primary concerns is supporting SQL Server peer-to-peer replication. The idea is to support several geographically separated nodes.

A secondary concern has been using a modern ORM in the middle tier. Our first choice has always been Entity Framework, mainly because the developers like to work with it. (They love the LiNQ support.)

So here's the problem:

With peer-to-peer replication in mind, I settled on using uniqueidentifier with a default value of newsequentialid() for the primary key of every table. This seemed to provide a good balance between avoiding key collisions and reducing index fragmentation.

However, it turns out that the current version of Entity Framework has a very strange limitation: if an entity's key column is a uniqueidentifier (GUID) then it cannot be configured to use the default value (newsequentialid()) provided by the database. The application layer must generate the GUID and populate the key value.

So here's the debate:

  1. abandon Entity Framework and use another ORM:
    • use NHibernate and give up LiNQ support
    • use linq2sql and give up future support (not to mention get bound to SQL Server on DB)
  2. abandon GUIDs and go with another PK strategy
  3. devise a method to generate sequential GUIDs (COMBs?) at the application layer

I'm leaning towards option 1 with linq2sql (my developers really like linq2[stuff]) and 3. That's mainly because I'm somewhat ignorant of alternate key strategies that support the replication scheme we're aiming for while also keeping things sane from a developer's perspective.

Any insight or opinion would be greatly appreciated.

A: 

Why not use identity column? If you are doing merge replication you can have each system start at a separate seed and work in one direction (e.g. node a starts at 1 and adds 1, node b starts at 0 and subtracts one)...

I'm really trying to avoid this. I know it works, but... it makes me sad. SQL Server 2005 is all prepared with its fancy NewSequentialId() function, and MS is so gung-ho with their fancy new Entity Framework. But put the two together and *poof* you're back in 1999.
c.batt
That's great until you overflow the datatype. Or the seed gets reset and there are gaps in your IDs. Or you need to do inserts into synchronized, physically separate databases and merge them. (Which gets into log shipping and other out-of-band hacks). There's no good reason to use an integer ID anymore.
David Lively
+2  A: 

Huh? I think your three options are a false choice. Consider option 4:

4) Use the Entity Framework with non-sequential, client-generated GUIDs.

The EF can't see DB-server-generated GUIDs for new rows inserted by the framework itself, sure, but you don't need to generate the GUIDs on the DB server. You can generate them on the client when you create your entity instances. The whole point of a GUID is it doesn't matter where you generate it. As for GUIDs generated by a replicated DB, the EF will see them just fine.

Your client-side GUIDs won't be sequential (use Guid.NewGuid()), but they will be world-wide, guaranteed unique.

We do this in shipping, production software with replication. It does work.

Craig Stuntz
And generating sequential guid's on the client, which if running on a web server would all be the same box anyway, is not that hard.
Craig
Truly random GUIDs lead to table fragmentation and page splitting, which in turn degrade performance.I've found a few articles about generating decent sequential GUIDs in the middle tier. This appears to be the best option at the moment.
c.batt
You will only incur massive table fragmentation, if you also set the CLUSTERING KEY on the GUID column. You don't HAVE TO, though! You can keep the PRIMARY KEY on your GUID column, and use an ever-increasing IDENTITY column as your CLUSTERING KEY --> no fragmentation to speak of!
marc_s
Right, marc. This also gives you a "prettier" alternate key to use in URIs.
Craig Stuntz
+5  A: 

I second Craig's suggestion - option 4.

You can always use the GUID column, populated by the middle-tier, as your PRIMARY KEY (that's a LOGICAL construct).

To avoid massive index (thus: table) fragmentation, use some other key (ideally an INT IDENTITY column) as the CLUSTERING KEY - that's a physical database construct, which CAN be separated from the primary key.

By default, the primary key is the clustering key - but that doesn't have to be that way. In fact, I improved performance and drastically lowered fragmentation by doing just that on a database I "inherited" - add a INT IDENTITY column and put the clustering key on that small, ever-increasing, never-changing INT - works like a charm!

Marc

marc_s
OK. This is sounding alright. I'll give it a shot. Thanks.
c.batt
A: 

You can use stored procedures if you are really stuck on using NewSequentialID(). You can bind the result columns from the procedure to the appropriate property and once inserted the SQL-generated GUID will be fed back into the object.

Unfortunately you have to define SPs for all three operations (insert, update, delete) even though the other operations would complete properly using the defaults. You also need to maintain the SP code and ensure it is synchronized with your EF model as you make changes, which may make this option unattractive on account of the additional overhead.

There is a step-by-step example at http://blogs.msdn.com/bags/archive/2009/03/12/entity-framework-modeling-action-stored-procedures.aspx which is pretty straight-forward.

Malcolm
A: 

use newseqid with your own orm (it not that hard) with linq

haroon
+2  A: 

Another option (not available when this was posted) is to upgrade to EF 4, which supports server-generated GUIDs.

Craig Stuntz
Thanks for posting this and reminding me that EF 4 is avail. Too late for me to switch now (I no longer work at the place where it was an issue), but this sounds like the best solution from this point forward. (Even though your past suggestion did work just fine. Thanks, btw.)
c.batt