views:

469

answers:

9

I've found a number of different questions on generating UIDs, but as far as I can tell, my requirements here are somewhat unique (ha).

To summarize: I need to generate a very short ID that's "locally" unique, but does not have to be "globally" or "universally" unique. The constraints are not simply based on aesthetic or space concerns, but due to the fact that this is essentially being used as a hardware tag and is this subject to the hardware's constraints. Here are the specifications:

Hard Requirements

  • The ID must contain only decimal digits (the underlying data is a BCD);
  • The maximum length of the ID is 12 characters (digits).
  • Must be generated offline - a database/web connection is not always available!

Soft Requirements

  • We'd like it to begin with the calendar year and/or month. As this does waste a lot of entropy, I don't mind compromising on this or scrapping it entirely (if necessary).
  • IDs generated from a particular machine should appear sequential.
  • IDs do not have to sort by machine - for example, it's perfectly fine for machine 1 to spit out [123000, 124000, 125000], and machine 2 to spit out [123500, 123600, 124100].
  • However, the more sequential-looking in a collective sense, the better. A set of IDs like [200912000001, 200912000002, 200912000003, ...] would be perfect, although this obviously does not scale across multiple machines.

Usage Scenario:

  • IDs within the scope of this scheme will be generated from 10, maybe 100 different machines at most.
  • There will not be more than a few million IDs generated, total.
  • Concurrency is extremely low. A single machine will not generate IDs more often than every 5 minutes or so. Also, most likely no more than 5 machines at a time will generate IDs within the same hour or even the same day. I expect less than 100 IDs to be generated within one day on a given machine and less than 500 for all machines.
  • A small number of machines (3-5) would most likely be responsible for generating more than 80% of the IDs.

I know that it's possible to encode a timestamp down to 100 ms or even 10 ms precision using less than 12 decimal digits, which is more than enough to guarantee a "unique enough" ID for this application. The reason I am asking this here on SO, is because I would really like to either try to incorporate human-readable year/month in there or encode some piece of information about the source machine, or both.

I'm hoping that someone can either help with a compromise on those soft requirements... or explain why none of them are possible given the other requirements.

(P.S. My "native" language is C# but code in any language or even pseudocode is fine if anybody has any brilliant ideas.)

Update:

Now that I've had the chance to sleep on it, I think what I'm actually going to do is use a timestamp encoding by default, and allow individual installations to switch to a machine-sequential ID by defining their own 2- or 3-digit machine ID. That way, customers who want to mess with the ID and pack in human-readable information can sort out their own method of ensuring uniqueness, and we're not responsible for misuse. Maybe we help out by providing a server utility to handle machine IDs if they happen to be doing all online installations.

+3  A: 

When you install your software, also install a machiine id file/registry key which contains a unique numeric id. As you only have a few machines, this should not take more than 3 or 4 digits. Use these as the MS digits. Generate the remaining digits sequentially starting at 1.

anon
I agree with the approach, although it is begging the question: In order to generate a unique ID, I must first generate a unique ID. How do I generate the unique machine ID?
Aaronaught
Machine 1 gets, 001 machine 2 gets 002 etc. You have an automated build process to generate each installation, The "next ID" is stored on the build machine.
anon
Failing an automated build process, have the installation force a phone call to get the machine id.
anon
I guess I could do this. The application is intended to be used offline, but there's technically nothing in the spec requiring it to be *installed* offline. I'd have to have an "activation" server somewhere, though, and figure out a way to generate parallel sequences for different customers, although those are slightly more tractable problems. I'll have to chew on this for a bit...
Aaronaught
+3  A: 

How about yyMMddhhmmID?

yy = two-digit year
MM = two-digit month
dd = two-digit day
hh = two-digit hour (24-hour time)
mm = two-digit minute
ID = machine-specific ID

Example: 0912113201 from machine with ID = 01.

Alternatively (if you don't like two-digit years (Y2K lol)), how about yyyyMMIDxxxx?

yyyy = four-digit year
MM = two-digit month
ID = machine-specific ID
xxxx = sequentially-incremented integer

Example: 200912010001 from machine with ID = 01.

As you said that each machine will only generate one identifier maximum every five minutes, this gives you room for 8,928 (24 * 31 * 60 / 5 = 8928) identifiers per month which will fit in xxxx. Here you could squeeze the year down to a three-digit year yyy (009, e.g.) if you needed an extra digit in the xxxx sequence or the machine ID.

Both of these fit timestamp/machine ID as you requested.

We all like concrete code:

class Machine {
    public int ID { get; private set; }
    public Machine(int id) {
        ID = id;
    }
}

 class IdentifierGenerator {
    readonly Machine machine;
    int seed;
    const int digits = 4;
    readonly int modulus;
    readonly string seedFormat;
    public IdentifierGenerator(Machine machine) {
        this.machine = machine;
        this.modulus = (int)Math.Pow(10, digits);
        this.seedFormat = new string('0', digits);
    }

    public string Generate() {
        string identifier = DateTime.Now.ToString("yyyyMM") 
                                + machine.ID.ToString("00") 
                                + seed.ToString(seedFormat);
        seed = (seed + 1) % modulus;
        return identifier;
    }
}

Machine m = new Machine(1);
IdentifierGenerator gen = new IdentifierGenerator(m);
Console.WriteLine(gen.Generate());
Console.WriteLine(gen.Generate());

Outputs:

200912010000
200912010001
Jason
Still leaves open the question of where the machine ID comes from, and tightly-controlled distribution (ID hard-coded into installer) is pretty much out of the question (now I have two problems). Any thoughts on this?
Aaronaught
What are your deployment options?
Jason
Single MSI/EXE installer that would be distributed either online or via CD (most likely one CD per customer, not per machine). There's no special licensing system - the hardware is pretty much a big dongle. I could see a single customer using 10 devices or 1 million; really can't make any assumptions about their environment other than .NET Framework.
Aaronaught
Yucky. Tough. You're going to have to either have some human intervention (please call 1-800-867-5309 for machine ID) or have the install routine phone home via the internet to get a machine ID.
Jason
A: 

There are 864000 100ms ticks in 24 hours, so tacking that onto a date might work 09.12.24.86400.0, but you have to lose the century to fit in 12 digits, and you don't have any space for machine IDs.

Pete Kirkham
A: 

Idea number one:

YYMMDDmmnnnn

where

YY is two digit year
MM is two digit month
DD is two digit day
mm is a two digit code unique to that machine (00 - 99)
nnnn is a sequential four digit code for that machine on that day.

~~

Idea number two:

mmmmnnnnnnnn

Where

mmmm is four digit code unique to the machine
nnnnnnnn is a sequential number.
David Oneill
A: 

My suggestion would be to combine multiple approaches in a single id. For example: start with the two year digits, the two month digits and then generate a random number with the time as a seed for the next several digits and then a unique machine id for the last couple. Or something like that.

Poindexter
A: 

Each machine gets a starting id of DDNNN, where DD is a unique machine identifier and NNN is the current identifier generated by that machine that day. Each machine keeps track of the ids that it has generated on a particular date and allocates the next one when it needs a new one by incrementing the last one by 1. It resets its counter to 0 at the beginning of each day. The date YYYYDOY is prepended to the number generated by each machine (4-digit year, 3-digit day of year). The number is guaranteed unique because the machine identifier is unique.

If you needed more space for more machines, you could drop the millenium from the year and add a digit for the machine id: YYYDOYDDDNNN.

tvanfosson
+3  A: 

"The reason I am asking this here on SO, is because I would really like to either try to incorporate human-readable year/month in there or encode some piece of information about the source machine, or both."

Let me start by saying I've dealt with this before and attempting to store useful information into a serial number is a BAD idea long term. A device serial number should be meaningless. Just like the primary key of a database record should be meaningless.

The second you start trying to put real data into your serial number, you've just thrown BUSINESS LOGIC into it and you will be forced to maintain it like any other piece of code. Future you will hate past you. Trust me on this. ;o)

If you attempt to store date/time values, then you'll waste numeric space with invalid time/dates. For instance you'll never have anything greater than 12 in the month field.

A straight epoch / unit time counter would be better, but for a machine that only generates a few id's per minute you'll still waste a lot of space.

12 digits is not a lot of space. Look at the VIN page on Wikipedia. Space for only a few manufacturers, only a few thousand cars. They are now reusing VINs because they ran out of space by packing meaning into it.

http://en.wikipedia.org/wiki/VIN

That's not to say ALL meaning in a serial number is bad, just keep it strictly limited to making sure the numbers don't collide.

Something like this...

  • Position 1-3: 999 Machines
  • Position 4-12: Sequential Numbers

That's ALL you need to avoid collisions. If you adding a location digit, then you are screwed when you get to 11 locations.

Sorry if this sounds like a rant. I deal with this a lot manufacturing electronics and various machined parts. It had never ended well long term unless there's LOTS of space available, or a secondary tag (which -wow- provides the necessary id space mentioned before)

Great Turtle
I agree for the most part; the subtle difference here is that the number is customer-specific, so unlike a VIN or a typical serial number, it does not matter if two different customers come up with the same number. The "real" serial number is independent. I feel the same way you do but wanted to see if there was a way to satisfy certain people's preferred naming conventions.
Aaronaught
Is not an automotive VIN number not a unique serial with useful information tied into it?
Chris
Chris: An automotive VIN is NOT unique - that was my point. I've written Dealer Management Systems and collisions while rare they do happen. The problem is there are/were multiple VIN encoding scheme that have changed over the years and/or country. Having to fix things the handle this "not-unique" unique value was expensive and I have seen similar problems in other industries. Just trying to help others avoid the problem where possible. Numbers are free after all. :)
Great Turtle
A: 

"A single machine will not generate IDs more often than every 5 minutes or so"

Assuming this is true, then just use the timestamp. (32 bit Unix time has 10 decimal digits but will run out in 2038)

But I think its rather optimistic to assume there won't be a collision.

"IDs generated from a particular machine should appear sequential."

Then your only option is to use a sequence number.

Which doesn't really seem to match what you say in later constraints?

Concatenate a padded version of the node id to get unique values across the cluster.

C.

symcbean
+1  A: 

I'm gathering you're developing for Windows (re: your comment about "MSI/EXE" in response to Jason's answer). As such, you could WMI or similar to get some unique hardware attribute (processor or HDD serial number, or NIC's MAC address for example) to base a unique machine ID upon. An alternative might also be using the unique serial number of the hardware you are yourself developing (if it has one).

That would most likely be longer than you need, so you could potentially truncate or hash it to reduce it to (say) 16 bits or so and use that as your machine ID. Obviously, this may cause collisions, but the small number of machines (~100) means this is unlikely, and using the truncated output of a cryptographic hash (say MD5) makes this even less so.

Then, since you have a (most probably unique) machine ID, you can then generate essentially unique IDs using the approaches listed by the other answers.

Mac