views:

117

answers:

3

I'm developing an asp.net application that has some potentially large data tables. I would like to know what would be the best way to define a primary key. I know this has been asked before, but since this is for a specific situation, I think the question is valid.

I'm using Entity Framework 4 on a SQL server 2008 database.

What are the possibilities for defining a primary key, considering the following:

  1. There is a real possibility that over time the number of records will exceed the 32 bit boundary, so an auto-increment integer will not be possible.
  2. There is no possibility to define a primary key on a combination of other columns in the table.
  3. For data synchronization reasons, an application-generated id would be preferable over a database-generated id. Also, in EF it would mean an extra roundtrip to the database to retrieve the newly generated id.
  4. For insert performance, a sequential key would be preferable.
  5. I consider the space requirements for a (sequential) guid a downside.
  6. For string id's, case insensitivity would be preferable.

What I've come up with myself so far is a custom algorithm which generates a datetime part and a random part, converted to a hexadecimal string representation. This leaves me with a slightly shorter string than a guid. I could still convert it to base64, but that would go against item nr 6.

Thanks for your suggestions.

+11  A: 

You could consider storing your key as a BIGINT (8 byte integer).

BIGINT works exactly like INT, and can be used in an auto-incrementing identity column in the same way.

Ed Harper
I guess a bigint will work best. I will just have to work around data migration / synchronization in another way.
Jappie
For the 'data synchronization', perhaps you could create a primary key consisting of 2 columns.1 column for the ID and 1 column with a machinename, or something like that?That way the ID is incremental and you can sync it with multiple machines without having errors about duplicate key's.Downside is the space requirements though, as it probably will take up more space as a GUID.
Jan_V
A: 

I would use a sequential GUID in your case.

  • it is a surrogate key
  • it is application generated, so no need to retrieve a database generated id after inserts
  • it is sequential and will work well with clustered indexes
  • if you may overflow 32 bit keys you will probably have to use 64 bit keys anyway (besides you manage to create and use 48 bit keys or something like that) - then 128 bit GUIDs only require twice the space
  • string surrogate keys are somewhat unnatural to me and I can see no benifit over GUID keys
Daniel Brückner
I've never heard of applications being able to generate pseudo-sequential GUID's ..... and even so - a GUID = 16 byte, BIGINT = 8 byte - it's still quite wasteful to use GUID - sequential or not. Plus: that waste of space is replicated into all non.clustered indices on the table, too, so it's a lot worse than first meets the eye
marc_s
+1  A: 

Here are a couple of thoughts.

  • Consider the binary data type of size 5 or 6 bytes.
  • Do not overlook the benefits of partitioned tables especially for large tables.
  • Keep the remaining columns as small as possible. Sometimes the star schema can help with this.

Unfortunately you cannot make binary data identity columns. But, you could use the max(Id)+1 inserting strategy. I am not that familiar with .NET's entity framework but there should be a way to retrieve the key on the same trip. I have seen documentation in the past explaining how to map entities to stored procedures and retrieving keys from them, but I do not have any specifics.

Brian Gideon
i think in EF 4.0, they added support for making binary key as entity key as well. I remember seeing that in the release notes for RC.
zeeshanhirani