views:

293

answers:

13

I'm writing a small system that will allow me to sell my band's music at gigs by generating vouchers that can be redeemed for MP3s at our website.

The vouchers will need a code that the user types in. The code needs to have the following qualities:

  1. Some level of human readability in terms of length and content, to prevent user frustration and data entry error.
  2. Given one voucher code, not trivial to guess another voucher code.

If I use GUIDs I'm concerned about point 1. If I use an incrementing integer I'm concerned about point 2. There has to be some happy medium in between, right? I thought perhaps this work has already been done and there's an ideal solution waiting out there for me. In the absence of that, I'm thinking I'll go with a random alphanumeric string, or possibly letters only (excluding I and O for clarity), and have the application block IP addresses that fail X number of times, which would indicate a possible brute force attack. If I went with that, how long of a string and what value of X would work, and why?

Thanks for your help!


Update: I wasn't totally explicit about the method: I will generate lists of voucher codes for printing, then enter the "sold" codes after a gig. Therefore I think elements like a checksum are not necessary like they are in software keys that don't use validation servers.

+2  A: 

5 blocks of 5 characters each should be sufficient - four blocks for the "key", the fifth as a checksum to ensure validity. And of course, don't use the whole keyspace.

That's roughly how software serial numbers appear to be laid out, anyway.

Anon.
Interesting, I never knew that! For my system however that kind of algorithm isn't directly applicable as i'll be pre-generating these numbers and then "validating" the codes I sold after a gig.
Barry Fandango
It's still applicable - you don't need to give out all of the codes, after all.
Anon.
+9  A: 

You could use a Markov Chain trained on English syllables to create a sentence composed of pronounceable-gibberish words. Just add the generated sentence to a database of valid vouchers when you print them (and invalidate them when they're redeemed, of course).

caf
Reminds me of http://thedailywtf.com/Articles/The-Automated-Curse-Generator.aspx
Anon.
You can also use some form of N-gram analysis: it may be easier to understand and implement. http://en.wikipedia.org/wiki/N-gram
Martinho Fernandes
My last comment is a bit confusing: N-gram analysis can be used to do the "training" part.
Martinho Fernandes
If you want to avoid the "Automated Curse Generator" problem, you can train it on words instead of syllables. I wrote such a thing in C# last week, and after feeding it a book for analysis I spits out "sentences" like "how many men are now faced with a lay education", "it would be to go on if you dont understand the situation".
Martinho Fernandes
As long as it's a rock band, an automated curse generator probably isn't a problem ;)
caf
Anon, excellent story!
Victor Hurdugaci
+3  A: 

Only 8 alphanumeric letters (except I and O) have 1785793904896 possible combinations. That's for all intent and purposes unguessable as long as you don't have 5 billions vouchers.

Andreas Bonini
Since Barry is in Canada, 5 billion vouchers can be either 5000000000000 or 5000000000. Crazy people.
Martinho Fernandes
Actually since we're obsessed with our British heritage we dumped the long scale when they did, back in the '70s. By either measure though I like these numbers.
Barry Fandango
+2  A: 

hmm, I do not know how most systems work, but I think it would be neat and simple to define a static number and multiply that number by a random other number. Then if the big GUID is a multiple of your static you are good.

Easy to produce, not easy to guess a new one (short term use only)

int i = 61234;
int j = rand()%99999
long GUID = i * j;

will give you a phone number length GUID

only 99999 uses though! doh

Charles
hmm, if i can't hit 100,000 sales our album can't go platinum!
Barry Fandango
+4  A: 

AOL used to use a random combination of two words for the CDs they sent out. You can take the same approach, and just increase the number of words to get the odds that you require.

Mark Ransom
I like this! Three words from [this list](http://www.math.toronto.edu/jjchew/scrabble/lists/common-234.html) of 3 and 4 letter words would have a keyspace of 128,405,466,125... very acceptable.
Barry Fandango
+1  A: 

On the blocking of brute force attacks I'd not bother to start with. With respect to you and your band, it's not as though you're protecting something really important.

It just seems a little disproportional to me.

Tom Duckering
You're absolutely right, I'm having entirely too much fun designing the system. But there you go, i'm a programmer at heart. plus, if it all works out i might host other bands' albums.
Barry Fandango
They're protecting their work. Notice the word "sell" in the question.
Martinho Fernandes
A: 

you can try something like random letter sequence generator ?. You can mix and match letters/numbers as well

ram
+1  A: 

One simple solution is to call the getHashCode method that most languages have on their string types. Set the string to some word from your list of approved words. Then call gethashcode and that will be your key. To verify it, compare it against your list of existing word hashes and maybe delete it from the list so it can't be used again.

TskTsk
+4  A: 

I would use your own encoding scheme. In addition to omitting I and O, for optimal readability it's also a good idea to omit all but one letter out of near-homonym sets (C/E, M/N) and multisyllabic letters, such as W, and of course stick to one case.

As far as length, you could use 60 bits, plus a 4-bit checksum. 64 bits is enough to store the time to millisecond granularity for several thousand years, so it's for all practical purposes unguessable. At say 4 bits per letter, that's 16 letters long. Even half that length is probably plenty.

Another way to think of this is in the form of automobile license plates: 3 letters and 3 numbers is enough to cover a pretty large state, and tends to be very readable. Unless you provide a way for someone to hack codes at high-speed, they certainly won't be guessable at human time scales.

RickNZ
homonym sets! This is what I come to SO for. :)
Barry Fandango
@RickNZ: 64 bit timestamps are used by Windows NTFS and OpenVMS: both count at ten million ticks per second. The year range is from 1601 to 60,055 for NTFS and 1858 to 31,084 for VMS. (VMS reserves the "negative" half of the range for relative time purposes.)
wallyk
See my update...
RickNZ
+1  A: 

Probably best to avoid all the vowels[*], thus avoiding all the swearwords.

[*] Including W if you're Welsh!

Sharkey
W is also the only multi-syllabic letter, so it takes much longer to say (hence my intense dislike of "www" for websites!).
RickNZ
You don't say "dub dub dub"?
wallyk
remember "trip dub"? or worse yet, back in the nineties on the radio you would hear "aitch tee tee pee, colon, forward slash, forward slash, ..."
Barry Fandango
Rick: totally agree, there's plenty of reasons to avoid it! By the time you cut out all the vowels and all the easy to mistake letters you get down to about 16, which is just right for 4 bits per character anyway.
Sharkey
+1  A: 

I'm assuming you're getting an email address when they purchase the voucher (you should). If so, why not just email them a single-use GUID? That way both you and they have a record of it, you can track redemptions, you don't run the risk of guessing (or at least not one worth bothering with), the user doesn't have to remember anything because it's there in the email, and you don't have to code anything.

They give you email address. You email GUID (with link). They click link and get song. GUID use is registered in system and will no longer work.

Chuck
As much as I would like to get a list of fan email addresses, I think it would act as a deterrent. We're talking a 1:00 AM drunken $5 impulse buy, and writing down your email address could really dampen that impulse.
Barry Fandango
Good point! If I like the band that wouldn't deter me but I may be the exception rather than the rule.
Chuck
A: 

Why not just go with the GUID and then replace any questionable characters with a different letter (so 0 becomes 'h', 1 is 'q' and so forth).

Grant Peters
+2  A: 

Well, if you really want human readable, you can use BubbleBabble. Create a Perl script like the following:

#!/usr/bin/perl
use Digest::BubbleBabble qw(bubblebabble);
use Digest::SHA1 qw(sha1);
print bubblebabble(Digest => sha1(join(' ', @ARGV))), "\n";

Then feed it any command line argument you want to get output like the following:

xogan-nydut-zogiv-kotyn-ledah-taseb-gyhib-tucel-vudul-mykom-mexax

Or if Perl's not your preference, you can use APG's pronounceable password mode (also available online) to get output like this:

BedHiv
cotsEub
AvRabinn
rarcUs
TeuvVarn
yuwats

Honestly, this level of human readability is overkill; RickNZ's answer should work just fine (and is pretty close to what we did for some software keys). But BubbleBabble is kind of fun.

Josh Kelley
Definitely fun, thanks for taking the time.
Barry Fandango