views:

406

answers:

10

As much as I like using GUIDs as the unique identifiers in my system, it is not very user-friendly for fields like an order number where a customer may have to repeat that to a customer service representative.

What's a good algorithm to use to generate order number so that it is:

  • Unique
  • Not sequential
  • Numeric values only
  • < 10 digits
  • Can be generated in the middle tier without doing a round trip to the database.

UPDATE (12/05/2009) After carefully reviewing each of the answers posted, we decided to randomize a 9-digit number in the middle tier to be saved in the DB. In the case of a collision, we'll regenerate a new number.

+7  A: 

If the middle tier cannot check what "order numbers" already exists in the database, the best it can do will be the equivalent of generating a random number. However, if you generate a random number that's constrained to be less than 1 billion, you should start worrying about accidental collisions at around sqrt(1 billion), i.e., after a few tens of thousand entries generated this way, the risk of collisions is material. What if the order number is sequential but in a disguised way, i.e. the next multiple of some large prime number modulo 1 billion -- would that meet your requirements?

Alex Martelli
+1 This is a linear congruential random number generator. Note that the sequence can be deduced from a few numbers, it is only obfuscated.
starblue
You can't really deduce a sequence if there is a real possibility that you're missing some numbers. E.g. "1 ? 2 ? 4" could very well be sequential.
MSalters
+1  A: 

Maybe you could try generating some unique text using a markov chain - see here for an example implementation in Python. Maybe use sequential numbers (rather than random ones) to generate the chain, so that (hopefully) the each order number is unique.

Just a warning, though - see here for what can possibly happen if you aren't careful with your settings.

a_m0d
+1 for the Automated Curse Generator link.
Ryan Lynch
+1  A: 

One solution would be to take the hash of some field of the order. This will not guarantee that it is unique from the order numbers of all of the other orders, but the likelihood of a collision is very low. I would imagine that without "doing a round trip to the database" it would be challenging to make sure that the order number is unique.

In case you are not familiar with hash functions, the wikipedia page is pretty good.

momeara
"Very low" is optimistic: with just 1 billion possible results, the chance of at least one collision for 60,000 orders is over 16%. Cfr the **Cast as a collision** paragraph in http://en.wikipedia.org/wiki/Birthday_paradox : probability of no collisions in n orders (with the best possible hash: one as good as perfect randomness!) is about `exp(-n*(n-1)/(2.0*10e9))`.
Alex Martelli
+1  A: 

You could base64-encode a guid. This will meet all your criteria except the "numeric values only" requirement.

Really, though, the correct thing to do here is let the database generate the order number. That may mean creating an order template record that doesn't actually have an order number until the user saves it, or it might be adding the ability to create empty (but perhaps uncommitted) orders.

Joel Coehoorn
In case anyone is interested, here an article on how to base64-encode a GUID: http://www.codinghorror.com/blog/archives/000409.html
Jason
I'd love to hear a customer try to read base64 over the phone...
Ryan Lynch
Ha, nvm. I was thinking of Ascii85...
Ryan Lynch
A: 

The straightforward answer to most of your bullet points:

Make the first six digits a sequentially-increasing field, and append three digits of hash to the end. Or seven and two, or eight and one, depending on how many orders you envision having to support.

However, you'll still have to call a function on the back-end to reserve a new order number; otherwise, it's impossible to guarantee a non-collision, since there are so few digits.

gw
+1  A: 

Use primitive polynomials as finite field generator.

ralu
+5  A: 

<Moan>OK sounds like a classic case of premature optimisation. You imagine a performance problem (Oh my god I have to access the - horror - database to get an order number! My that might be slow) and end up with a convoluted mess of psuedo random generators and a ton of duplicate handling code.</moan>

One simple practical answer is to run a sequence per customer. The real order number being a composite of customer number and order number. You can easily retrieve the last sequence used when retriving other stuff about your customer.

James Anderson
+1. True. We use the sequence per customer approach. It works flawless. You have less code to worry. :)
Guru
@James: It's not that we are scared of accessing the - gasp - database. It's because you can have a race condition between 2 orders being assigned the same order number if you go to the database and return to the middle tier to find out the next logical order number.
Jason
Table with 1 row 1 column. 1 command increments and returns the result. Everything goes through the command. No worries about a race.
s_hewitt
Most DBMSes suport a "sequence" feature, and most DBMSes also support sub-transactions or spearate units of work. Just get your next sequence outside the normal UOW -- you will end up with gaps in the sequence becuase of cancelled UOWs aborted oreders etc. but that doesnt really matter. You will certainly not have race conditions.
James Anderson
A: 

We do TTT-CCCCCC-1A-N1.

  • T = Circuit type (D1E=DS1 EEL, D1U=DS1 UNE, etc.)
  • C = 6 Digit Customer ID
  • 1 = The customer's first location
  • A = The first circuit (A=1, B=2, etc) at this location
  • N = Order type (N=New, X=Disconnect, etc)
  • 1 = The first order of this kind for this circuit
Josh Einstein
+1  A: 

One simple option is to use the date and time, eg. 0912012359, and if two orders are received in the same minute, simply increment the second order by a minute (it doesn't matter if the time is out, it's just an order number).

If you don't want the date to be visible, then calculate it as the number of minutes since a fixed point in time, eg. when you started taking orders or some other arbitary date. Again, with the duplicate check/increment.

Your competitors will glean nothing from this, and it's easy to implement.

Will
To increase the granularity you can add seconds and milliseconds. But you still will run into race conditions if you don't have a single synchronized order number factory.
akr
+1  A: 

Your 10 digit requirement is a huge limitation. Consider a two stage approach.

  1. Use a GUID
  2. Prefix the GUID with a 10 digit (or 5 or 4 digit) hash of the GUID.

You will have multiple hits on the hash value. But not that many. The customer service people will very easily be able to figure out which order is in question based on additional information from the customer.

John