views:

78

answers:

3

Question: When you have a .NET GUID for inserting in a database, it's structure is like this:

60 bits of timestamp, 
48 bits of computer identifier,
14 bits of uniquifier, and
 6 bits are fixed, 
----
128 bits total

Now I have problem with a GUID, because it's a 128 bit number, and some of the DBs I'm using only support 64 bit numbers.

Now I don't want to solve the dilemma by using an autoincrement bigint value, since I want to be able to do offline replication.

So I got the idea of creating a locally unique identifier class, which is basically a GUID downsized to a 64 bit value.

I came up with this:

day  9 bit (12*31=372 d)
year 8 bit (2266-2010 = 256 y)
seconds  17 bit (24*60*60=86400 s)
hostname 12 bit (2^12=4096)
random 18 bit (2^18=262144)
------------------------
          64 bits total

My question now is: The timestamp is pretty much fixed at 34 bits, leaving me with 64-34=30 bits for the hostname + random number.

Now my question: 1) Would you rather increase the hostname-hash bitsize and decrease the random bitsize, or increase the random bitsize and decrease the hostname-hash bitsize.

2) Exists there a hash algorithm that reduces every string to n-Bits? n being ideally = 12 or as near as possible.

+1  A: 

Actually, .NET-generated GUIDs are 6 fixed bits and 122 bits of randomness.

You could consider just using 64 bits of randomness, with an increased chance of collision due to the smaller bit length. It would work better than a hash.

Stephen Cleary
There are various approaches; I also like the idea of a "node id" with a timestamp (no randomness). You can easily create a node id with any number of bits by XOR'ing a cryptographic hash (e.g., SHA1). The fewer bits, the higher the chance of a node id collision, of course. The "uniquifier" you mentioned is actually used by other Guid algorithms to handle system clocks going backwards, to keep the timestamps unique per node id. But at the end of the day, you'll be hard-pressed to find a solution that guarantees fewer collisions than pure randomness. Remember, that's all .NET Guids do...
Stephen Cleary
While the probability of 1/2^64 is still a very small number, I don't like the thought of a pure random number. But I thought over omitting the hostname hash altogether and just increase the random number to 30 bits. But that isn't a good idea, because for n offline clients, that would make the chance of collision go to 2^30*n. For 100 clients, that's only about one per 10 millions. With a lot of bad luck, one might just hit the jackpot there...
Quandary
1/2^64 == one in 18 septillion (one septillion == one trillion one trillions, or one million million millions). If you go the fully-random way...
Stephen Cleary
@Stephen: No, it's 1/2^32 due to the birthday effect.
erikkallen
@erikkallen: Actually, the probability of collision starts out extremely small (1/2^64) and grows as ids are generated. What you're thinking of is the number of ids generated before a collision is expected (the probability of collision crosses the 50% mark). At 2^32 (over 4 billion) ids, the probability of a collision is 50%. Thanks for bringing up the birthday effect, though; I did neglect it. :)
Stephen Cleary
+1  A: 

If space isn't a concern, then why don't you just use 2 columns that are 64bits wide, then split the guid in half using 8bytes for each then just convert those to your 64bit numbers and store it in 2 columns, then if you ever do need to upsize to another system, you'll still be unique you'll just need to factor in the rejoining of the 2 columns.

Paul Farry
Then I'll have to compare two numbers for every join. Doesn't that decrease performance by too much ?
Quandary
Well you'll be involving an extra column in your key[im assuming the guid is a key] so you'll have a slight change, but this way you don't lose the Guid on systems that can support it and you have a workaround for the ones that don't.
Paul Farry
A: 

Why write your own? Why not just generate a uniformly random number? It will do the job nicely. Just grab the first X digits where X is whatever size you want... say 64-bits.

See here for info about RAND() vs. NEWID() in SQL Server, which is really just an indictment of GUIDs vs. random number generators. Also, see here if you need something more random than System.Random.

mattmc3
Completely random numbers are not a good idea, IMHO. I don't want to be worrying about duplicates and strange errors as the database gets larger and larger. At least a timestamp needs to be integrated in some way. Though thinking about it, it might be wiser to leave the seconds out and just increase the random integer size. That way I can have a fairly long hostname hash and a fairly long random number.
Quandary