views:

277

answers:

8

Via this link, I know that a GUID is not good as a clustered index, but it can be uniquely created anywhere. It is required for some advanced SQL Server features like replication, etc.

Is it considered bad design if I want to have a GUID column as a typical Primary Key ? Also this assumes a separate int identity column for my clustering ID, and as an added bonus a "user friendly" id?

update

After viewing your feedback, I realise I didn't really word my question right. I understand that a Guid makes a good (even if its overkill) PK, but a bad clustering index (in general). My question more directly asked, is, is it bad to add a second "int identity" column to act as the clustering index?

I was thinking that the Guid would be the PK and use it to build all relationships/joins etc. Then I would instead of using a natural key for the Cluster Index, I would add an additional "ID" that not data-specific. What I'm wondering is that bad?

A: 

It is not bad design at all, an int Identity for your clustering key gives you a number of good benefits (Narrow,Unique,Ascending) whilst keeping the GUID for functionality purposes very separate and acting as your primary key.

If anything I would suggest you have the right approach, although the "user friendly" ID is the most questionable part - as in what purpose is it there to serve.

Addendum : I should put in the obligatory link to (possibly?) the most read article about the topic by Kimberley Tripp. http://www.sqlskills.com/BLOGS/KIMBERLY/post/GUIDs-as-PRIMARY-KEYs-andor-the-clustering-key.aspx

Andrew
All I meant by user-friendly is that its less to type for a user doing a query. I realize I could likely get away with only using an int as my PK, but I like the idea of the GUID as a main PK since it gives most flexibility down the road. Is that wrong?
Nate Bross
A: 

I think it is bad design to do it that way but I don't know if it is bad otherwise. Remember, SQLServer automatically assigns the clustered index to the Primary key. You would have to remove it after making the GUID the primary key. Also, you usually want your identity column to be your primary key. So doing what you are saying would confuse anyone who reads your code that doesn't look closely. I would suggest you make the ID column your primary key, identity column, and put the clustered index on it. Then make your GUID column a unique key, making it a non-clustered index and not allowing nulls. That in affect will do what you want but will follow more of the standard.

RandomBen
It only automatically assigns it through the table designer, in a create table script you have full control.
Andrew
Just do a `CONSTRAINT PK_table PRIMARY KEY NONCLUSTERED(GuidField)` in your create script for the table - and your PK is no longer clustered. Now add `CREATE CLUSTERED INDEX ......` and you're done. No muss, no fuss, no need to drop anything
marc_s
You can remove or not add the clustered-index to the PK but there is a reason why it does it for you. In general it is good practice. If someone else ever looks at the code and tries to use it will not be obvious unless they look closely how your code is working.
RandomBen
Just because SSMS provides a default, does not make it right. TLog expansion is set at 10% of a file that defaults to 1 meg - I can't imagine I would ever accept that on a production system. The defaults are there to help protect the database from some horrendous design flaws, but they are not infallible or always right.
Andrew
In this situation which would you build relationships based on?
Nate Bross
+1  A: 

Using a GUID is lazy -- i.e., the DBA can't be bothered to model his data properly. Also it offers very bad join performance -- typically (16-byte type with poor locality).

Is it a bad design, if I want to have a GUID column as my typical Primary Key, and a separate, int identity column for my clustering ID, and as an added bonus a "user friendly" id?

Yes it is very bad -- firstly you don't want more than one "artificial" candidate key for your table. Secondly, if you want a user friendly id to use as keys just use a fixed length type such as char[8] or binary(8) -- preferably binary as the sort won't use the locale; you could use 16-byte types however you will notice a deterioration in performance -- however not as bad as GUID's. You can use these fixed types to build your own user-friendly allocation scheme that preserves some locality but generates sensible and meaningful id's.


As an Example:

If you are writing some sort of a CRM system (lets say online insurance quotes) and you want an extremely user friendly type for example a insurance quote reference (QR) that looks like so "AD CAR MT 122299432".

In this case -- since the quote length huge -- I would create a separate LUT/Symboltable to resolve the quote reference to the actual identifier used. but I will divorce this LUT from the rest of the model, I will never use the quote reference anywhere else in the model, especially not in the table representing the QR's.

Create Table QRLut
{
    bigint bigint_id;
    char(32) QR;
}

Now if my model has one table that represents the QR and 20 other tables featuring the bigint QR as a foreign key -- the fact that a bigint is used will allow my DB to scale well -- the wider the join predicates the more contention is caused on the memory bus -- and the amount of contention on the memory bus determines how well your CPU's can be saturated (multiple CPU's).

You might think with this example that you could just place the user-friendly QR in the table that actually represents the quote, however keep in mind that SQL server gathers statistics on tables and indices, and you don't want to let the server make caching decisions based on the user-friendly QR -- since it is huge and wastefull.

Hassan Syed
So would you use an int/bigint as the Primary Key and the clustering index? And scrap the Guid completely?
Nate Bross
Yes, and if I would need a user friendly id (in your case a GUID) of some sort I would use an auxiliary table for it.
Hassan Syed
A: 

Personally, I would go this way:

An internally known identity field for your PK (one that isn't known to the end-user because they will inevitably want to control it somehow).
A user-friendly "ID" that is unique with respect to some business rule (enforced either in your app code or as a constraint).
A GUID in the future if it's ever deemed necessary (like if it's required for replication).

Now with respect to the clustered index, which you may or may not be confused about, consider this guide from MS for SQL Server 2000.

Austin Salonen
+1  A: 

What are you intending to accomplish with the GUID? The int identity column will also be unique within that table. Do you actually need or expect to need the ability to replicate? If so, is using a GUID actually preferable in your architecture over handling identity columns through one of the identity range mangement options?

If you like the "pretty" ids generated using the Active Record pattern, then I think I'd try to use it instead of GUIDs. If you do need replication, then use one of the replication strategies appropriate for identity columns.

tvanfosson
All I meant by 'pretty' is that its less to type for a user doing a query. I realize I could likely get away with only using an int as my PK, but I like the idea of the GUID as a main PK since it gives most flexibility down the road. Is that wrong?
Nate Bross
+1  A: 

Consider using only GUID, but get your GUIDs using the NEWSEQUENTIALID method (which allocates sequential values and so doesn't have the same clustering performance problems as the NEWID method).

A problem with using a secondary INT key as an index is that, if it's an adequate index, why use a GUID at all? If a GUID is necessary, how can you use an INT index instead? I'm not sure whether you need a GUID, and if so then why: are you doing replication and/or merging between multiple databases? And if you do need a GUID then you haven't specified exactly how you intend to use the non-globally-unique INT index in that scenario.


Sounds like what you are saying is that I have not made a good case for using a Guid at all, and I agree I know its overkill, but my question I guess would be is it too much overkill?

I think it's convenient to use GUID instead of INT for the primary key, if you have a use case for doing so (e.g. multiple databases) and if you can tolerate the linear, O(1) loss of performance caused simply by using a bigger (16-byte) key (which results in there being fewer index instances per page of memory).

The bigger worry is the way in which using a (random) GUID could affect performance when it's used for clustering. To counter-act that:

  • Either, use something else (e.g. one of the record's natural keys) as the clustered index, even if you still use a GUID for the primary key

  • Or, let the clustered index be the same field as the GUID primary key, but use NewSequentialId() instead of NewId() to allocate the GUID values.


is it bad to insert an additional artifical "id" for clustering, since I'm not sure I'll have a good natural ID candidate for clustering?

I don't understand why you wouldn't prefer to instead use just the GUID with NewSequentialId(), which is I think is provided for exactly this reason.

ChrisW
(+1) if your going to use GUID's definitely do this. But it still doesn't make GUIDs a nice solution imho.
Hassan Syed
I like the idea of a guid as PK, and int as a cluster index, but maybe the reson I like guid's is unfounded?
Nate Bross
read my post for some discussion -- GUID's are a waste of an adress space -- i.e., they start at a random location and you have no controll over the scheme used to generate them.
Hassan Syed
I want to be able to merge records from several databases. To do this I could use compound keys, where every record is identified by INT ServerId plus INT RecordId, but then I'd have to start administering ServerId values ... easier for me to just use GUID. On the other hand, high performance isn't a problem for me (I don't have so many records that I can't afford to use 16-byte GUID key fields instead of 4-byte INT key fields).
ChrisW
@Chris I understand, IMO only 30% of db work these days requires serious modelling attention. However for the 30% of the workload you might be interested in the diagrams on page 16 and 17 of this study: "what every programmer should know about memory" http://people.redhat.com/drepper/cpumemory.pdf - bad locality and 2x (compared to bigint) or 4x(compared to int) size = sad memory bus :D
Hassan Syed
@Hassan I use a natural key for the clustered index. For example, imagine that employees have a GUID as their ID, and each belong to one department, and that the SELECT is typically used to retrieve all employees within a department: I therefore define the DepartmentId as the clustered index, even though the EmployeeId is the primary key.
ChrisW
Sounds like what you are saying is that I have not made a good case for using a Guid at all, and I agree I know its overkill, but my question I guess would be is it too much overkill?
Nate Bross
@Nate I edited my answer to answer your latest question.
ChrisW
Thanks, is it bad to insert an additional artifical "id" for clustering, since I'm not sure I'll have a good natural ID candidate for clustering?
Nate Bross
@Nate I edited my answer to answer your latest question.
ChrisW
How would I do that? Set it as the default entry? That then restricts me from inserting a new record with my own Guid from my business layer, no?
Nate Bross
Yes, letting the DB assign it by setting it as the default entry is the most obvious way to use it. I think that's the way in which integer IDENTITY values are usually assigned too: they're not assigned by the business layer, instead the business layer inserts new records into the database with a NULL primary key and lets the database assign the primary key value.
ChrisW
+2  A: 

If you are going to create the identity field anyway, use that as the primary key. Think about querying this data. Ints are faster for joins and much easier to specify when writing queries.

Use the GUID if you must for replication, but don't use it as a primary key.

HLGEM
Do you think that the speed will make a lot of difference, given that this application will likely not exceed 200k records in the entire database?
Nate Bross
A: 

You are right that GUIDs make good object identifiers, which are implemented in a database as primary keys. Additionally, you are right that primary keys do not need to be the clustered indices.

GUIDs share the same characteristics for clustered indexes as INT IDENTITY columns, provided that the GUIDs are sequential. There is a NewSequentialID specific to SQL Server, but there is also a generic algorithm for creating them called COMB GUID, based on combining the current datetime with random bytes in a way that retains a large degree of randomness while retaining sequentiality.

One thing to keep in mind, if you intend to use NHibernate at some point, is that NHibernate natively knows how to use the COMB GUID strategy - and NHibernate can even use it to do batch-inserts, something that cannot be done with INT IDENTITY or NewSequentialID. If you are inserting multiple objects with NHibernate, then it will be faster to use the COMB GUID strategy than either of the other two methods.

Justice