I've found a number of different questions on generating UIDs, but as far as I can tell, my requirements here are somewhat unique (ha).
To summarize: I need to generate a very short ID that's "locally" unique, but does not have to be "globally" or "universally" unique. The constraints are not simply based on aesthetic or space concerns, but due to the fact that this is essentially being used as a hardware tag and is this subject to the hardware's constraints. Here are the specifications:
Hard Requirements
- The ID must contain only decimal digits (the underlying data is a BCD);
- The maximum length of the ID is 12 characters (digits).
- Must be generated offline - a database/web connection is not always available!
Soft Requirements
- We'd like it to begin with the calendar year and/or month. As this does waste a lot of entropy, I don't mind compromising on this or scrapping it entirely (if necessary).
- IDs generated from a particular machine should appear sequential.
- IDs do not have to sort by machine - for example, it's perfectly fine for machine 1 to spit out [123000, 124000, 125000], and machine 2 to spit out [123500, 123600, 124100].
- However, the more sequential-looking in a collective sense, the better. A set of IDs like [200912000001, 200912000002, 200912000003, ...] would be perfect, although this obviously does not scale across multiple machines.
Usage Scenario:
- IDs within the scope of this scheme will be generated from 10, maybe 100 different machines at most.
- There will not be more than a few million IDs generated, total.
- Concurrency is extremely low. A single machine will not generate IDs more often than every 5 minutes or so. Also, most likely no more than 5 machines at a time will generate IDs within the same hour or even the same day. I expect less than 100 IDs to be generated within one day on a given machine and less than 500 for all machines.
- A small number of machines (3-5) would most likely be responsible for generating more than 80% of the IDs.
I know that it's possible to encode a timestamp down to 100 ms or even 10 ms precision using less than 12 decimal digits, which is more than enough to guarantee a "unique enough" ID for this application. The reason I am asking this here on SO, is because I would really like to either try to incorporate human-readable year/month in there or encode some piece of information about the source machine, or both.
I'm hoping that someone can either help with a compromise on those soft requirements... or explain why none of them are possible given the other requirements.
(P.S. My "native" language is C# but code in any language or even pseudocode is fine if anybody has any brilliant ideas.)
Update:
Now that I've had the chance to sleep on it, I think what I'm actually going to do is use a timestamp encoding by default, and allow individual installations to switch to a machine-sequential ID by defining their own 2- or 3-digit machine ID. That way, customers who want to mess with the ID and pack in human-readable information can sort out their own method of ensuring uniqueness, and we're not responsible for misuse. Maybe we help out by providing a server utility to handle machine IDs if they happen to be doing all online installations.