views:

169

answers:

3

Check this,

    List<String> list = new ArrayList<String>();
    for (int i = 0; i < 10000; i++) {
        String value = (""+UUID.randomUUID().getLeastSignificantBits()).substring(3, 20);
        assertFalse(list.contains(value));
        assertTrue(value.length() < 18);
        list.add(value);
    }

This method is passing like charm. And I am in the impression that it is slightly better to take least significant bits, rather than the most significant. Because in the most significant bits you have 6 bits fixed for some info, and with least significant its not the case. Hence, on average we need to generate 2^29 UUIDs to get a collision with most significant bits, but 2^32 many with least significant bits. Ref: SO Thread. Am I right in assuming that?

Now, here I am chopping 2 more most significant digits of the least significant bits I got from the method. I am using substring to that. Notice I am chopping 2 digits and a sign bit off. Does that not mean that now on average we need to generate 2^31 UUIDs to get a collision?

Precisely, I am trying to generate a unique identifier which should not exceed 17 digit length. And it must be an integer, not in a sense of Java type. How reliable is my approach?

Meta Information:

Actually, we are integrating with some legacy system, and we must provide some unique number not more than 17 digits. They are having it as a database unique key, I assume. We can also use sequence in this case, and I proposed that in the first place. But they said to me that its good if I can come up with a random number instead, so consumer can't guess.

As far as I know regarding the type-4 implementation of UUID in Java we need to generate 2^61 UUIDs on average to get a collision. Does that not means that we need to generate 2^32 to get the collision on least significant bits, and 2^29 to get the collision on most significant bits? If yes then is it not correct to assume that we need to generate on average 2^31 to get the collision on least significant bits after chopping of 2 left most digits?

I tried to use SecureRandom as well, but that is also giving me 19 digits long value. Hence I end up chopping first to digits of that too. Below is the code for that.

    List<String> list = new ArrayList();
    Random random = new SecureRandom();
    for (int i = 0; i < 10000; i++) {
        String value = ""+random.nextLong().substring(2, 19);
        assertFalse(list.contains(value));
        assertTrue(value.length() < 18);
        list.add(value);
    }

The other option I can think of is to use date in a format "yyMMddHHmmssSSS+2-seq-digits". But that would be quite processor dependent, and guessable, I suppose. Because I am not quite certain that I got a change in millisecond after 99 rounds. May be I will, but that would depend on processor speed. 99 simultaneous requests are quite unlikely though.

A: 

The UUID algorithm is implementation specific.

Chopping up the guid into smaller numbers is NOT going to have the same uniqueness spread or guarantee's. Are the bits you're saving really that important?

Spence
As far as I know regarding the type-4 implementation of UUID in Java we need to generate 2^61 UUIDs on average to get a collision. Does that not means that we need to generate 2^32 to get the collision on least significant bits, and 2^29 to get the collision on most significant bits? If yes then is it not correct to assume that we need to generate on average 2^31 to get the collision on least significant bits after chopping of 2 left most digits?
Adeel Ansari
Actually, we are integrating with some legacy system, and we must provide some unique number not more than 17 digits. They are having it as a database unique key, I assume. We can also use sequence in this case, and I proposed that in the first place. But they said to me that its good if I can come up with a random number instead, so comsumer can't guess.
Adeel Ansari
+1  A: 

OK, several questions, I'll do my best

  1. If you have collusion in the lower bits but not in the upper bits, you still have uique ids. Same goes for the opposite case. Therefore, you need 2^61 numbers to get a collusion.
  2. With a probability of .5, you are chopping 3 digits, since the + sign is not written. Therefore, you have total of 2^41 possible numbers, so the sample size for collusion is 2^21. (10^18 =~ 2^41)
  3. Let's see how you get your results: Random.getLong() is called twice, and then you remove some of the bits (random bits the PRNG created). I can't see how it is more reliable than calling Random.getLong() or getInt().

If you need a 17 digit number, why not do the following:

String id = String.valueOf(random.nextLong) % 1000000000000000000L);

Notice it won't give symmetric distribution - since MAX_LONG is 9223372036854775807L the numbers in the range [0,23372036854775807] will slightly better chance to appear.

Also, both you method and this one do not ensure unique id.

David Rabinowitz
Thanks pal. Okay here, http://stackoverflow.com/questions/325443/generate-uuid-in-java. He is stating that if we don't chop the anything and get the most significant bits as it is, we have to generate 2^29 UUIDs to get a collision, on average. Is that wrong? If not than for least significant should be 2^32. Am I correct? If yes, then how it become 2^16 after chopping just 3 digits?
Adeel Ansari
I've fixed my answer - let's take 10^18 as the upper limit for all 17 digit numbers. It equals roughly to 2^41.
David Rabinowitz
+1  A: 

I suggest that you use either Random or SecureRandom to generate random bits and turn those into a number. That should be more portable.

I don't understand your point about chopping digits. Assuming that you are generating a 17 (decimal) digit number from enough bits from a long-cycle PRNG you should have a chance of 1 in 10**17 of a collision for any given generated pair of numbers. If the source is good, and you use enough bits then it is immaterial that you are "chopping" ...

It is not clear to me that 1 in 10**17 is good enough. It depends on how many numbers are going to exist (in your persistent store) at any given time. For instance, if you have 44 million numbers extant, the chance of a collision between at least one pair will be about 1%.

Try plugging some numbers into a Birthday Paradox Calculator.

EDIT: I think what you need is a generator that gives you 64 bit pseudo random numbers with a long cycle length AND an absolute guarantee of no repetitions for more numbers than you could ever possibly generate. It must also be possible to persist the state of the generator and resume it. Then, to get a 17 decimal digit "random" number, get the next value from the generator and test if it is in the range 0 ... 10**17 - 1. If it is, use it, if not repeat.

If you manage the generator correctly, you never get a repetition for the lifetime of the system, and therefore there is zero risk of a collision. But it is important that you use a PRNG (not a true RNG) and that you pick a PRNG that has the right properties.

From what I can tell, the Random class offers a PRNG with a cycle length of 2**48; i.e. you should get 2**48 numbers (e.g. using the getLong() method) before the numbers start to repeat. OTOH, SecureRandom gives either truly random or pseudo-random with a very long cycle count ... but with a small but non-zero chance of repeating a number on each call.

Stephen C
Thanks for the explanation. I think getting the collision is quite unlikely with any of this algo. But still there is a chance. I am thinking of using `yyyyMMdd + db-seq(9)` for this. This way I am restricting 999999999 entries in a day. Or `yyMMddHHmmss + db-seq(5)`, which means restricting 99999 entries in a second. Both are fine to me. Thanks.
Adeel Ansari
For the last case, the BPC tells me you get a 1% chance of a collision after 45 numbers have been issued; i.e. if your average rate of issuing is 45 per second, there is a 1% chance of a collision in any given second.
Stephen C
Do you mean by SecureRandom long and chopping that to get the 17 digit long value?
Adeel Ansari