views:

722

answers:

3

Hi, At the moment we have a number of tables that are using newid() on the primary key. This is causing large amounts of fragmentation. So I would like to change the column to use newsequentialid() instead.

I imagine that the existing data will remain quite fragmented but the new data will be less fragmented. This would imply that I should perhaps wait some time before changing the PK index from non-clustered to clustered.

My question is, does anyone have experience doing this? Is there anything I have overlooked that I should be careful of?

A: 

If this is SQL Server, you are generating a Guid by calling newid(). This is not good for primary keys. Use an integer identity column for the primary key, and make your Guid a surrogate key (and a row guid column).

cdonner
That is why he is asking about switching to newsequentialid() instead.
Rex M
+2  A: 

You might think about using comb guids, as opposed to newsequentialid.

cast(
    cast(NewID() as binary(10)) +
    cast(GetDate() as binary(6))
as uniqueidentifier)

Comb guids are a combination of purely random guids along with the non-randomness of the current datetime, so that sequential generations of comb guids are near each other and in general in ascending order. Comb guids have various advantages over newsequentialid, including the facts that they are not a black box, that you can use this formula outside of a default constraint, and that you can use this formula outside of SQL Server.

Justice
I thought about them, although I guess I would prefer to go with the 'supported' feature. I suppose that I could theoretically switch a table from newid() to COMBs without a problem though as the datatypes are the same.
cbp
+2  A: 

If you switch to sequentialguids and reorganize the index once at the same time, you'll eliminate fragmentation. I don't understand why you want to just wait until the fragmented page links rearrange themselves in continuous extents.

That being said, have you done any measurement to show that the fragmentation is actually affecting your system? Just looking at an index and seeing 'is fragmented 75%' does not imply that the access time is affected. There are many more factors that come into play (buffer pool page life expectancy, rate of reads vs. writes, locality of sequential operations, concurrency of operations etc etc). While switching from guids to sequential guids is usualy safe, you may introduce problems still. For instance you can see page latch contention for an insert intensive OLTP system because it creates a hot-spot page where the inserts accumulate.

Remus Rusanu
Thanks Remus - interesting comment. I have heard about the 'page latch contention' once before but it seems to be rarely mentioned in the arguments for and against GUIDs.You are right, I was probably wrong in my thinking about waiting before switching to a clustered index.
cbp
A system would have to have enough processors and enough inserts/second for page latch contention to show up, so is far less mentioned. Its ironic though cuz its mitigation is to split the one hot spot into multiple warm spots (ie. insert around several values rather than one value), ultimately causing fragmentation.
Remus Rusanu