views:

76

answers:

2

Hi all,

As all developers do, we constantly deal with some kind of identifiers as part of our daily work. Most of the time, it's about bugs or support tickets. Our software, upon detecting a bug, creates a package that has a name formatted from a timestamp and a version number, which is a cheap way of creating reasonably unique identifiers to avoid mixing packages up. Example: "Bug Report 20101214 174856 6.4b2".

My brain just isn't that good at remembering numbers. What I would love to have is a simple way of generating alpha-numeric identifiers that are easy to remember.

It takes about 5 minutes to whip up an algorithm like the following in python, which produces halfway usable results:

import random

vowels = 'aeiuy' # 0 is confusing
consonants = 'bcdfghjklmnpqrstvwxz'
numbers = '0123456789'

random.seed()

for i in range(30):
    chars = list()
    chars.append(random.choice(consonants))
    chars.append(random.choice(vowels))
    chars.append(random.choice(consonants + numbers))
    chars.append(random.choice(vowels))
    chars.append(random.choice(vowels))
    chars.append(random.choice(consonants))
    print ''.join(chars)

The results look like this:

re1ean
meseux
le1ayl
kuteef
neluaq
tyliyd
ki5ias

This is already quite good, but I feel it is still easy to forget how they are spelled exactly, so that if you walk over to a colleagues desk and want to look one of those up, there's still potential for difficulty.

I know of algorithms that perform trigram analysis on text (say you feed them a whole book in German) and that can generate strings that look and feel like German words and are thus easier to handle generally. This requires lots of data, though, and makes it slightly less suitable for embedding in an application just for this purpose.

Do you know of any published algorithms that solve this problem?

Thanks!

Carl

+2  A: 

I am not sure that this answers your question, but maybe think about how many unique bug report number you need.

Simply using a four letter uppercase alphanumeric key like "BX-3D", you can have 36^4 = 1.7 million bug reports.

Edit: I just saw your sample. Maybe the results could be considerably improved if you used syllables instead of consonants and vowels.

Jens
+2  A: 

As you said, your sample is quite good. But if you want random identifiers that can easily be remembered, then you should not mix alphanumeric and numeric characters. Instead, you could opt to postfix an alphanumeric string with a couple of digits.

Also, in your sample You wisely excluded 'o', but forgot about the 'l', which you can easily confuse with '1'. I suggest you remove the 'l' as wel. ;-)

Prutswonder