views:

398

answers:

2

I'm upsizing a backend MS Access database to SQL Server. The front-end client will remain an Access application for the time being (has about 30k lines of code).

The aim is to eventually allow synchronisation of the database accross multiple servers (not using replication but probably the sync framework).
Currently, all Primary Keys in the Access tables are autoincrement integer surrogates.

I'm not asking about the process of upsizing but about whether I should use GUID or another codification for the PK (I know I could split the number range accross servers, but I don't want to do that and allow the PK to be created on the client if necessary, for instance in Offline mode).

GUID

Pro:

  • standardised format.
  • uniqueness assured (practically anyway)

Cons:

  • not easy to manipulate in Access, especially when using them as filters for subforms or in controls.
  • degrade INSERT performance due to their randomness.
  • has more than one representation: string, canonical form, binary that need to be converted.

Custom codification scheme

I thought that maybe a scheme using a more uniform code as PK would avoid the performance penalty and, most importantly, ensure that the PK remains usable in form controls where necessary (and not require these conversions to/from string).

My idea for a codification scheme would be along the lines of a 10 character code split into:

  • 8 digits for a timestamp
  • 4 digits for a unique client ID
  • 2 digits as a random number for potential colisions Each digit would be in base 34, composed of letters from A-Z and 2-9, avoiding O, 0, 1, I because of their similitude (in case we need to manually handle these PK, for instance during debugging).

Pro:

  • easier to handle manually when the case arises.
  • don't require conversion between different representation since it's basically a string (so less existing code to adapt).
  • uniqueness assured (practically)

Cons:

  • performance in JOIN hasn't been proven
  • performance in INSERT should be faster than GUID but not proven
  • Each server/client machine must be set its own UID, although that should not be too much of an issue.

So, should I use GUID or another scheme for my PK?

+2  A: 

How many records are you planning on? Is bigint not big enough? Up to 9,223,372,036,854,775,807 records (if you don't include the negatives)

If it is only for inserts, and no selects on the data, go for what ever scheme, (i would still say bigint or GUID/uniqueidentifier). If you need to do selects, an int, or bigint is much faster than GUID or any other custom PK.

Gidon
The problem with autoincrements is that they are difficult to manage if you want your applications to work offline or accross multiple servers. You need to allocate ranges and that isn't something I want to manage and it can become hairy.
Renaud Bompuis
Than I tend to agree with the answer from TcKs, the theoretical performance problem on insert is in my opinion less important than the pros of using a standard build in datatype, which is easier to use and maintain than any custom scheme.
Gidon
No - GUIDs as your primary or more exactly your clustering key are EVIL all around! They not only give you a performance hit on INSERT, they also MASSIVELY fragment your tables - ALL your performance will suffer.
marc_s
Check out Kimberly Tripp's excellent article on the never-ending clustered key debate at http://sqlskills.com/BLOGS/KIMBERLY/post/The-Clustered-Index-Debate-Continues.aspx
marc_s
Also, Access doesn't recognize BIGINT. You have to do workarounds to make results with a BIGINT PK editable.
David-W-Fenton
+1  A: 

not easy to manipulate in Access, especially when using them as filters for subforms or in controls.

-> Access has GUID as Number->Replication Identificator. We have application in Access with every PK as GUID and we haven't any problem with filters (and with filters for subfroms too).

degrade INSERT performance due to their randomness.

-> If you have performance problem based od this, you can have cluster index on another column (timestamp for example). But MSSQL server has two function for generating new GUID values - "newid()" and "newsequenceid()". The second methods - as name says - generates new values in sequence, so the insert performace issue should not happens.

has more than one representation: string, canonical form, binary that need to be converted.

-> its "PRO" in my sight :). But for users-developer and users-admins is in Access and MSSQL represented and consumed as string..

The GUID is in core "only" 128bit number. I don't think you should worry about efectivity of JOINs on GUID columns. The joining GUID columns is much more eficient than conditions on text columns for example.

I don't hink the Custom codification scheme is good idea, because you must solve many things. On other hand the GUID is standardly used and tools are ready to use it.

TcKs
How do you create sequential GUID from within your Access app? I need to keep this capability on the client as well so I can have an Offline mode where the client may be disconnected from the SQL server.
Renaud Bompuis
When you are connected, you can call "newsequenceid()" from access on mssql.But, when you are disconnected, then is "newsequenceid()" function unavailable.But I realy think, the unsequenced IDs is not problem. We used GUIDs as PK for years and we haven't performance problems with it.
TcKs
GUIDs are BAD BAD BAD for the clustering index (which the primary key is by default). They lead to MASSIVE fragmentation and hurt performance all around. Also, since the clustering key is included in EVERY non-clustered key, their size (16 byte vs. 4 byte for INT) IS a problem!
marc_s
The problems with GUIDs in Access are outlined here: http://trigeminal.com/usenet/usenet011.asp?1033 . There are workarounds, of course, but I'd avoid GUIDs in any app with an Access front end if it is possible.
David-W-Fenton
@marc_c: As I wrote - clustered index can be anything else. And the difference between 16B and 4B is on table with 1 000 000 records only 8MB. The cost of 8MB is very, very small. What problem do you mean?
TcKs
@David W. Fenton: Interesting. You should read #17: Can you rely on system tables...? (http://trigeminal.com/usenet/usenet017.asp?1033): it explains why ACE is not Jet (hint: because some code written against Jet is broken by ACE; if ACE was de facto Jet 5.0 then this wouldn't have been permitted).
onedaywhen