views:

840

answers:

7

Hrm... here's where my CS knowledge lets me down. I want to write an algorithm that generates a reference number that is unique.

I don't want to use sequential numbers as they introduce a security risk and I want to use alphanumerics. The ref will have a min and max length too. (I can't use a GUID it is too long)

Ideally I don't want to query my persistence layer to see if a ref has been used before.

What strategies can I employ?

+2  A: 

If you're worried about security risks, then you want a cryptographically-secure random number generator. You should be able to tell it how many bytes you want (i.e. how long the number can be).

Roger Lipscombe
+1  A: 

If this number will be ever be referenced by humans, I encourage you to follow these guidelines in your solution:

http://stackoverflow.com/questions/178572/what-is-the-best-format-for-a-customer-number-order-number

If you can't synchorize with the database to see what the next number will be, and you can't use GUIDs or a comparably long random string, then you need to include some sort of local value in the ID.

e.g., if all clients will be on a known network, you can end each number in each client's ip address D block.

Or, if clients have to login and each user can login only once at a time, you can include their userid in the number somewhere.

Michael Haren
A: 

Truncate the GUID to the size you want.

If you're generating numbers, unless they are random and huge, you are better off checking to see if they've been used anyway.

Diodeus
I do not recommend this approach unless you add code to handle collisions.
Michael Haren
Truncating is asking for a world of hurt - see this post http://stackoverflow.com/questions/352674/creating-guids-with-a-set-prefix
Gavin Miller
+1  A: 

I'm taking a stab in the dark here but...you want a random value that will be unique, but less then 16 bytes. Your best bet is still a GUID which is only 16 bytes....You want to use alphanumerics so...some options.

Use a GUID but encode it base64 looks like 7QDBkvCA1+B9K/U0vrQx1A which is 22 bytes which is still longer then a native Guid...but shorter then the typical string representation.

See Text Encoding here: http://en.wikipedia.org/wiki/Globally_Unique_Identifier

Another option would be to hash the Guid but you will loose some of the uniqueness so what is your tolerance level here for non-unique items?

==========

Assuming you have a single process inserting into the table you could emlpoyee a HiLo algorithim and be confident you don't have to hit the DB each time. You'd simply store in memory the last high value...when the process startsup you'd go hit the db to find out where you left off: http://stackoverflow.com/questions/282099/whats-the-hilo-algorithm

I still say a Guid is your best bet....16 bytes is not bad and will be just as small as most alphanumeric solutions you come up with.

JoshBerke
A: 

One way may be to generate the numbers based on a smaller subset of numbers. For example, you could use a binary sequence to generate based on a godel numbering. For example, mapping 000 to 111 on 5z, 3y, 2x yields 0, 2, 3, 6, 5, 10, 15, 30.

Of course, this is overly simplistic. But by iterating of the "salt" numbers to generate the reference numbers, you wouldn't have to track the reference numbers at all. Provided, or course, you were reasonably sure you didn't have to factor in collisions.

J.T. Hurley
A: 

If possible in your application/environment, did you consider to add the time as part to a pseudo-random generated number?

i.e. microtime() + rand(10000,99999)

Karsten
A: 

I've been doing this in a production system with success:

  • Take the current time (UTC, with microsecond precision)
  • Your process id, thread id
  • Your computer name
  • A salt value (basically just a string unique to your program)
  • A random value (preferrably a crypto-grade PRNG)

Put this in memory, either as a string, or XOR the values together or something similar. Then:

  • Hash it with e.g. SHA-1
  • Do mod N on the resulting number to shrink the output to N bytes
  • Convert to hexadecimal or something printable if you need it.

Just be aware that shrinking the UID to N bytes will increase the chances of UID-collisions.

All the input data in the first list is to ensure that you get a unique base for hashing if you have a cluster of many computers. You can omit some of them, but you have to be certain that it contains something that makes it different for each computer you'll generate the UID on.

csl