views:

1382

answers:

7

Hello! Up until now i've been using the C# "Guid = Guid.NewGuid();" method to generate a unique ID that can be stored as the ID field in some of my SQL Server database tables using Linq to SQL. I've been informed that for indexing reasons, using a GUID is a bad idea and that I should use an auto-incrementing Long instead. Will using a long speed up my database transactions? If so, how do I go about generating unique ID's that are of type Long?

Regards,

+3  A: 

A long (big int in sql server) is 8 bytes and a Guid is 16 bytes, so you are halving the number of the bytes sql server has to compare when doing a look up.

For generating a long, use IDENTITY(1,1) when you create the field in the database.

so either using create table or alter table:

Field_NAME BIGINT NOT NULL PRIMARY KEY IDENTITY(1,1)

See comments for posting Linq to sql

Kevin
When you submit your new record to the database using L2S leave the PK unpopulated. It will be populated by L2S as part of the submit so you'll be able to retrieve it from your record object after submission, should you so need to do so.
Lazarus
Is there a way of doing this using the SQL Server design view? My SQL scripting code isnt too great........
Goober
There is, but I don't know how to do it. Sorry :(
Kevin
+2  A: 

Both have pros and cons, it depends entirely on how you use them that matters.

Right off the bat, if you need identifiers that can work across several databases, you need GUIDs. There are some tricks with Long (manually assigning each database a different seed/increment), but these don't scale well.

As far as indexing goes, Long will give much better insert performance if the index is clustered (by default primary keys are clustered, but this can be modified for your table), since the table does not need to be reorganized after every insert.

As far as concurrent inserts are concerned however, Long (identity) columns will be slower then GUID - identity column generation requires a series of exclusive locks to ensure that only one row gets the next sequential number. In an environment with many users inserting many rows all the time, this can be a performance hit. GUID generation in this situation is faster.

Storage wise, a GUID takes up twice the space of a Long (8 bytes vs 16). However it depends on the overall size of your row if 8 bytes is going to make a noticable difference in how many records fit in one leaf, and thus the number of leaves pulled from disk during an average request.

David
plus if you use GUID as your clustering key, that value (twice as big as a BIGINT) is added to each and every entry in each and every non-clustered index, too - thus multiplying the waste of space, not just on disk, but also in SQL Server RAM.
marc_s
+1  A: 

You can debate GUID or identity all day. I prefer the database to generate the unique value with an identity. If you merge data from multiple databases, add another column (to identify the source database, possibly a tinyint or smallint) and form a composite primary key.

If you do go with an identity, be sure to pick the right datatype, based on number of expected keys you will generate:

bigint - 8 Bytes - max positive value: 9,223,372,036,854,775,807  
int    - 4 Bytes - max positive value:             2,147,483,647

Note "number of expected keys " is different than the number of rows. If you mainly add and keep rows, you may find that an INT is enough with over 2 trillion unique keys. I'll bet your table won't get that big. However, if you have a high volume table where you keep adding and removing rows, you row count may be low, but you'll go through keys fast. You should do some calculations to see how log it would take to go through the INTs 2 trillion keys. If it won't use them up any time soon go with INT, otherwise double the key size and go with BIGINT.

KM
+1  A: 

The "Queen of Indexing" - Kim Tripp - basically says it all in her indexing blog posts:

Basically, her best practices are: an optimal clustering key should be:

  • unique
  • small
  • stable (never changing)
  • ever-increasing

GUID's violate the "small" and "ever-increasing" and are thus not optimal.

PLUS: all your clustering keys will be added to each and every single entry in each and every single non-clustered index (as the lookup to actually find the record in the database), thus you want to make them as small as possible (INT = 4 byte vs. GUID = 16 byte). If you have hundreds of millions of rows and several non-clustered indices, choosing an INT or BIGINT over a GUID can make a major difference - even just space-wise.

Marc

marc_s
+1  A: 

Use guids when you need to consider import/export to multiple databases. Guids are often easier to use than columns specifying the IDENTITY attribute when working with a dataset of multiple child relationships. this is because you can randomly generate guids in the code in a disconnected state from the database, and then submit all changes at once. When guids are generated properly, they are insainely hard to duplicate by chance. With identity columns, you often have to do an intial insert of a parent row and query for it's new identity before adding child data. You then have to update all child records with the new parent identity before committing them to the database. The same goes for grandchildren and so on down the heirarchy. It builds up to a lot of work that seems unnecessary and mundane. You can do something similar to Guids by comming up with random integers without the IDENTITY specification, but the chance of collision is greatly increased as you insert more records over time. (Guid.NewGuid() is similar to a random Int128 - which doesn't exist yet).

I use Byte (TinyInt), Int16 (SmallInt), Int32/UInt16 (Int), Int64/UInt32 (BigInt) for small lookup lists that do not change or data that does not replicate between multiple databases. (Permissions, Application Configuration, Color Names, etc.)

I imagine the indexing takes just as long to query against regardless if you are using a guid or a long. There are usually other fields in tables that are indexed that are larger than 128 bits anyway (user names in a user table for example). The difference between Guids and Integers is the size of the index in memory, as well as time populating and rebuilding indexes. The majority of database transactions is often reading. Writing is minimal. Concentrate on optimizing reading from the database first, as they are usually made of joined tables that were not optimized properly, improper paging, or missing indexes.

As with anything, the best thing to do is to prove your point. create a test database with two tables. One with a primary key of integers/longs, and the other with a guid. Populate each with N-Million rows. Moniter the performance of each during the CRUD operations (create, read, update, delete). You may find out that it does have a performance hit, but insignificant.

Servers often run on boxes without debugging environments and other applications taking up CPU, Memory, and I/O of hard drive (especially with RAID). A development environment only gives you an idea of performance.

Lewis Moten
A: 

Consider creating sequential GUID from .NET application:

http://dotnet-snippets.de/dns/sequential-guid-SID998.aspx

http://stackoverflow.com/questions/170346/what-are-the-performance-improvement-of-sequential-guid-over-standard-guid

Boris Modylevsky