views:

260

answers:

5

This is not a real situation; please ignore legal issues that you might think apply, because they don't.

Let's say I have a set of 200 known valid license keys for a hypothetical piece of software's licensing algorithm, and a license key consists of 5 sets of 5 alphanumeric case-insensitive (all uppercase) characters. Example: HXDY6-R3DD7-Y8FRT-UNPVT-JSKON

Is it possible (or likely) to extrapolate other possible keys for the system?

What if the set was known to be consecutive; how do the methods change for this situation, and what kind of advantage does this give?

I have heard of "keygens" before, but I believe they are probably made by decompiling the licensing software rather than examining known valid keys. In this case, I am only given the set of keys and I must determine the algorithm. I'm also told it is an industry standard algorithm, so it's probably not something basic, though the chance is always there I suppose.

If you think this doesn't belong in Stack Overflow, please at least suggest an alternate place for me to look or ask the question. I honestly don't know where to begin with a problem like this. I don't even know the terminology for this kind of problem.

A: 

The system could generate the keys randomly then have clients check against a central server. That is just as secure as any other DRM algorithm, that is to say not at all. In this case, extrapolation is impossible.

Matthew Flaschen
This is not the problem though; there does exist an algorithm, and I need to find it.
Ricket
You "need" to solve your thought experiment? Doesn't sound so hypothetical to me any more.
Chad Birch
I need to find the algorithm in order to solve the problem. Quid pro quo.
Ricket
There's *a* reason not to require internet activation: it's intolerable for some customers, who then may not buy your product.
bobince
@bobince, I consider all DRM intolerable. Sorry if I didn't make that clear. My point was that the keys could be random.
Matthew Flaschen
Ah, I see — misunderstanding of ‘no reason’ :-)
bobince
+1  A: 

Assuming that the system is cryptographically strong knowing those keys will do you no good. Now, many such systems are implemented by guys too cheap to buy a real keygen so you might still have a hope.

Just a little back of the envelop estimation says that such a key has 125 (was 800 oops, thanks for catching that) bits of information, if you sampled that space enough you might be able to make some sort of an attack, but you are talking about a huge number of sample points. But hey, what other plans did you have for your spare time?

Even the big guys mess this up, a half dozen years ago there was a mistake in the way MSDN keys were being generated that allowed some sort of brute force attack. You would find guys selling MSDN subscriptions on eBay as part of a corporate licensing bundle. You would submit your info and they would give you a preregistered MSDN account in a couple of days. I am pretty sure they were taking advantage of a bug in the implementation and brute forcing the enrollment until one stuck.

A company I worked for bought one, Microsoft honored it since we were none the wiser when we bought it, but they were interested in the address of the guy who sold it to us.

Ukko
Where do you get 800 bits from? At 8 bits/character, it's 200 bits maximum (25*8). Considering the character space is (often, but not always) 2-9 and A-Z except A,E,I,O,U, or 30 possible characters, I come up with ~ 5 bits/character or ~125 bits. We eliminate 0, 1 for their ambiguity with O and I, and eliminate vowels because you'd be suprised how often a random string of characters forms words that *somebody* will find offensive!
Bob Kaufman
Oops, would you believe I spilled coffee on my envelope ;-) I had guessed there were 32 possible values per position, since 32 is a nice number and it left off some cases like 1 and l. Then I took 32 = 2^5 and did god knows what. I multiplied 32 * 25 instead of 32 *5, the former gets you 800 and the latter the correct estimate of 160. Which matches with your 125 bit calculation using a smaller alphabet. If it is any consolation I did all the math in my head, I guess that might be more of an indictment...
Ukko
A: 

In general, the answer is, "No, you can't do anything useful."

If the people generating the keys got lazy and failed to use some sort of cryptographic-quality hash off of an index number (with sufficient bit-mixing to thwart any inspection on your part), then you might assume some sort of functional form of random number generation and see if you can back out, for example, a modulus for a linear congruential random number generator, or a series of bit-mixing shifts and adds and such as in the Jenkins hash function or whatever.

There are no algorithms for going from some generic structure that you spot to the algorithm that produces said structure; something akin to this is what you appear to be asking for. (Such algorithms are provably impossible in general; if you want the simplest algorithm that can compute your keys, the problem is isomorphic to the computation of Kolmogorov complexity, which is fiendishly hard ("effectively impossible thus far") to compute.)

Rex Kerr
+1  A: 

Keys that are validated offline tend to be defined by a set of some properties. Certain subsets of bits have certain values or symmetries. If certain subsets of bytes passed to certain functors return the expected result then the key is deemed valid.

If you researched common algorithms for generating keys you could probably come up with a universe of possible properties. You could then use inductive logic programming to find which of those properties apply to all valid keys, but not invalid keys. (You also need a set of invalid keys, but those are easy to generate). From the results you could theoretically write a keygen. You could also probably write a paper out of it, if you can get it to work. Good luck.

If they're validated online however, they may simple be pseudo-random numbers that are checked against a database. In that case you are SOL.

patros
+1  A: 

This is notoriously hard to solve in the general case. However, if

I'm also told it is an industry standard algorithm

if this is the case, you should obtain a list of those "standard algorithms", and analyze them for weaknesses.

My naïve guess is that most key generation is of the form x || hash(x || fixed), where x is a randomly generated value for each key, and fixed is a fixed value. Using this form, keys can be validated quite easily (extract x, calculate hash(x || fixed), see if it matches).

Assuming you knew the exact algorithm used, you'd either have to find a weakness in the algorithm (not likely, unless they are using a hash with known vulnerabilities), brute-force the fixed value, etc.

Given that there are lots of hashes which do not have known vulnerabilities and if you assume that the guys who chose the fixed value where not dumb... this might be a tough cookie unless you have good cryptanalysis skills available.

So the problem is very hard to solve if the guys who designed the algorithm are not dumb. But they might be...

alex