views:

89

answers:

4

Hi,

we are currently designing an API for our product. The API offers access to a graph consisting of relations between types, such as users, posts or the like.

Now, those objects are referenced by primary key id in our database, but since every object is in a different table, those ids collide without the information of the type of object (=table).

This might not seem to be a problem but it really is for us - the API design gets much more inconsistent with those ID collissions/type informations in the wrong place.

Now the idea of using UUIDs came up, and since we probably will move away from an SQL db to K/V store in the future, that might not be the worst idea, also UUIDs offer far better uniqueness and also would scale better. So implementing UUIDs in our API wouldn't be the worst thing ever from various perspectives.

However, for the transition period, we still need to access the objects by DB id, and the UUID should be generated from an id and allow to infer the id from the UUID vice versa.

Something like 550e8400-e29b-11d4-YYYY-XXXXXXXXXXXX came to mind, where X would be the primary key from the DB, and YYYY would be a code for the type of object.

Is there a "right" way to do this? Can i break anything with this approach? Saving additional UUID information in whole or part is not really an option.

Thanks for your insight, Philip

A: 

two things come to mind:

create a single sequence, and use that to populate all primary keys. This will not be a true UUID, but will be unique within the current system. (i.e. you will be fine in one instance, but sharing data may be an issue with other systems)

create a map table. something like EXTERNAL_KEY with internal_id, internal_table, external_id. Populate this with a row from ON INSERT triggers on every table, and use the external_id as the UUID in your application.

btw - i would advise strongly against creating an 'intelligent' key with substrings for certain values etc..

hth

Randy
The single sequence thing is not an option due to constraints on changing the existing db.The map table is actually a good idea, but the trigger aspect of it makes it very hard to sell.What would the arguments against intelligent keys for a transition period be? My thought was to make it look like a UUID (though it is not) so that we can move to uuids eventually without breaking the api...
Philip
A: 

By definition, a surrogate key has no relation to the data itself. So, whether you are using an integer or a guid, the presumption with a surrogate key is that there is no means to determine anything about the entity itself solely from the key. It is because of this, that you must have a unique constraint on something else in the table beyond the surrogate primary key. Thus, I would not try to bake any information into a new surrogate key schema. Instead, I would make the guids "unique" and use them just like the integer key. The argument against trying to bake information into the guid is simplicity and thereby cost effectiveness. I doubt that you will ever use the magic baked into the guid but it will make it a pain to generate them.

Now, by "unique" I am hinting at a problem with guids: they do not index well. While their uniqueness is an advantage, it is also a weakness in this regard. The common solution is to use what is called a COMB guid which replaces part of the guid with a datetime value. The guid is still mostly unique but a portion of it is now sequential and will index well.

Thomas
A: 

I don't know of any specific problems, but one recommendation would be to use version 4 GUIDs:

Version 4 UUIDs have the form xxxxxxxx-xxxx-4xxx-yxxx-xxxxxxxxxxxx with any hexadecimal digits for x but only one of 8, 9, A, or B for y. e.g. f47ac10b-58cc- 4 372- a 567-0e02b2c3d479.

You could put your data in the "random" sections. Just claim that you have a really rubbish random number generator. You could also include a checksum to protect against data-corruption collisions, since your GUIDs will be near each other numerically.

Douglas
A: 

Maybe I'm being a bit too simplistic, but if you are willing to change the data type of the identifier being passed around, hows about creating a computed column, type varchar ( ? ) that holds ServerName/Instance.db_name.schema_name.table_name.id. Uniqueness is guaranteed given servername/Instance is unique to your environment, db to your instance, schema to your db, tble to your schema and id to your table - which they all should be.

I don't have ANY problem with de-normalizing data for this sort of use. What I have a huge problem with is anyone using a field such as this to ask questions - in code - such as, give me all of the records that came from Server02. This would be a string field and won't be so searchable, much like a GUID, but, depending on where / how they are being stored, outside of their originating table, would, I suspect be less of a cause of defragmentation in indexes to which they belong.

  • for the 'ordered' guids in sql server see Creating and Modifying Identifier Columns

    CREATE TABLE dbo.Globally_Unique_Data (guid uniqueidentifier CONSTRAINT Guid_Default DEFAULT NEWSEQUENTIALID() ROWGUIDCOL, Employee_Name varchar(60) CONSTRAINT Guid_PK PRIMARY KEY (guid) );

MaasSql