views:

734

answers:

12

It need not be meaningful words - more like random password generation, but the catch is - they should be unique. I will be using this for some kind of package / product code. Which is the best method available? :)

A: 

Given that you mention passwords here, I'll assume you need a secure method (ie: someone shouldn't be able to guess someone else's password based on knowing any other password). You could use the following:

  1. Decide on a master password, for example "MasterPassword"
  2. For each password generated, append to that either a random or sequential nonce, for example "MasterPassword1", "MasterPassword2".
  3. Perform a cryptographic hash on that (SHA, MD5, etc) and covert the hash to a hexadecimal representation, eg "ce7f181a44a4a5b7e43fe2b9a0b1f0c1".
  4. Truncate that to as many characters as you need - perhaps seven as you indicated: "ce7f181".
  5. Check if that has been assigned before. If not, return that as your password. Otherwise, repeat from 2.

If security is not an issue, steps 1 and 2 by themselves would be sufficient. If security is an issue, it is vital that no one but yourself knows the value of "MasterPassword".

Mac
I'm not big into crypto so I can't put words to my discomfort with this idea...but it makes me uncomfortable. You've reduced the possibilities by going with base16 instead of, e.g., base-26+26+10; plus (guessing here) you compromise the integrity of a hash when you truncate it. I admit, though, the requirement of a unique password is a bit strange, too...
Michael Haren
Agreed on all points. I'll happily concede the point with regards to the hex, you're right that the more potential characters the better. Also agreed on the truncation, but short of requiring 32-character passwords (or worse) it's unavoidable. Besides, the hash is more used as a means of generating a highly unique password given a similar base to other passwords, than as a security measure. Really, a non-crypto hash or maybe a CRC could be used too, except that that makes the master password more vulnerable.
Mac
Cryptographic hashes are not unique. They are *definitely* not unique if you truncate them. This method really has no advantage over simply generating random strings and checking _those_ to see if they've been assigned.
Zarel
P.S. And I _am_ [somewhat] big into crypto, so I can put into words my discomfort with this idea. ;)
Zarel
OK, I've poorly worded my response above. I never meant to imply that the hash would be unique, merely that it was highly likely to be unique. If security is required, then there are advantages to this method compared to a non-cryptographic random string generator - it is (provably) more difficult for an attacker to reverse engineer *a particular* other user's password given a sample of generated passwords. You *could* essentially look at it as character generation using a cryptographic PRNG (the hash), in which case there is in fact little difference between this answer and others.
Mac
*Really* poorly worded that comment, now that I reread it... ;)
Mac
+2  A: 

It is generally not possible to generate sequences with both unique and random elements: obviously to be unique the algorithm has to take into account the previously generated elements in the sequence, so the next ones will not really be random.

Therefore your best bet would be to detect collisions and just retry (which could be very expensive in your particular case).

If you are constrained to just 7 chars, there's not much you can do above:

$allowed_chars = 'abcdefghijklmnopqrstuvwxz';
$allowed_count = strlen($allowed_chars);
$password = null;
$password_length = 7;

while($password === null || already_exists($password)) {
    $password = '';
    for($i = 0; $i < $password_length; ++$i) {
        $password .= $allowed_chars{mt_rand(0, $allowed_count - 1)};
    }
}

This should eventually give you a new password.

However, in similar cases I have encountered I usually pick a larger password size which also happens to be the size of the hex representation of a popular hash function (e.g. md5). Then you can make it easier on yourself and less error prone:

$password = time(); // even better if you have some other "random" input to use here

do {
    $password = md5(time().$password);
}
while (already_exists($password));

This also has the added advantage that the sequence space is larger, hence there will be less collisions. You can pick the size of the hash function according to the expected numbers of passwords you will generate in the future to "guarantee" a low collision probability and thus less calls to the possibly expensive already_exists function.

Jon
'alphanumeric' implies you'll need 0...9 in your $allowed_chars variable :)
Catchwa
I think I got the point across ;-)
Jon
A: 

Here's how I would solve this problem:

Consider that the 7 characters can be one of 26 letters (abc..z), or 10 numbers (01...9). This makes 36 possible characters.

Each time your application generates a new code, have it increment a global variable. You can turn this unique number into a unique string by using a "Hexatridecimal" converter, and by adding filler characters to make up the rest of the string.

Take a look at this link. I think this guy had the same problem as you: http://www.codemaxima.com/2010/04/the-hexatridecimal-numbering-system/

Michel Carroll
A: 
$random = substr(hash('md5',openssl_random_pseudo_bytes(32)),0,7);
crrodriguez
A: 

Here's a way you could do it without hashes or loops:

$password = sprintf(
    "%04s%03s",
    base_convert(mt_rand(0, pow(36, 4) - 1), 10, 36),
    base_convert(mt_rand(0, pow(36, 3) - 1), 10, 36)
);

As a few others have mentioned, ensuring uniqueness is more complicated, and should be unneeded. The simplest way you could do it would be to add extra characters at the end, incrementing with each password generated.

nickf
+1  A: 
Alix Axel
careful of integer overflow here. `intval(microtime(true) * 10000) == -65867797` - this gives you a 6 character output.
nickf
@nickf: You're right! Better drop that first line off.
Alix Axel
A: 

Use Kohana text,

http://docs.kohanaphp.com/helpers/text

For example,

   $prod_id = text::random('alpha', 7);

If you don't want use the framework, you can simply copy the code. You will find lots of goodies there.

ZZ Coder
A: 

Heres a very simple way

$chars = 'abcdefghijklmnopqrstuvwxyz0123456789';
$temp_pw = substr( str_shuffle( $chars ), 0, 7 );
if ( check_unique( $temp_pw ) ) {
    $pw = $temp_pw;
}

You'll have to implement your own check_unique function. That part should be easy.

Galen
A: 

+1 to @Michael Haren's comment. If the passwords on your site should not have a constraint to be unique.

If I try to use a given password and I get an error that I can't use it because it's already in use, then I know some user on the system has that password. If there are 1000 users I only need to try a max of 1000 other accounts before I find the one who has that password.

Not really answering your question, but more than a comment. So I'm marking this CW.

Bill Karwin
I believe the "password generation" term was an example, I think he wants to generate product codes.
Alix Axel
@Alix: Thanks, I didn't read his question closely enough. My bad! I'll leave my CW answer here so readers who consider forcing passwords to be unique can see the counter-argument.
Bill Karwin
+1  A: 

A random alphanumeric (base 36 = 0..9 + a..z) value that has 7 chars has to have a base 10 representation between 2176782336 and 78364164095, the following snippet proves it:

var_dump(base_convert('1000000', 36, 10));                   //  2176782336
var_dump(base_convert('zzzzzzz', 36, 10));                   // 78364164095

In order for it to be unique we have to rely on a non-repeating factor, the obvious choice is time():

var_dump(time());                                            //  1273508728
var_dump(microtime(true));                                   //  1273508728.2883

If we only wanted to ensure a minimum uniqueness factor of 1 unique code per second we could do:

var_dump(base_convert(time() * 2, 10, 36));                  // 164ff8w
var_dump(base_convert(time() * 2 + 1, 10, 36));              // 164ff8x
var_dump(base_convert(time() * 2 + 2, 10, 36));              // 164ff8y
var_dump(base_convert(time() * 2 + 3, 10, 36));              // 164ff8z

You'll notice that these codes aren't random, you'll also notice that time() (1273508728) is less than 2176782336 (the minimum base 10 representation of a 7 char code), that's why I do time() * 2.

Now lets do some date math in order to add randomness and increase the uniqueness factor while complying with the integer limitations of older versions of PHP (< 5.0?):

var_dump(1 * 60 * 60);                                       //       3600
var_dump(1 * 60 * 60 * 24);                                  //      86400
var_dump(1 * 60 * 60 * 24 * 366);                            //   31622400
var_dump(1 * 60 * 60 * 24 * 366 * 10);                       //  316224000
var_dump(1 * 60 * 60 * 24 * 366 * 20);                       //  632448000
var_dump(1 * 60 * 60 * 24 * 366 * 30);                       //  948672000
var_dump(1 * 60 * 60 * 24 * 366 * 31);                       //  980294400
var_dump(PHP_INT_MAX);                                       // 2147483647

Regarding PHP_INT_MAX I'm not sure what exactly changed in recent versions of PHP because the following clearly works in PHP 5.3.1, maybe someone could shed some light into this:

var_dump(base_convert(PHP_INT_MAX, 10, 36));                 // zik0zj
var_dump(base_convert(PHP_INT_MAX + 1, 10, 36));             // zik0zk
var_dump(base_convert(PHP_INT_MAX + 2, 10, 36));             // zik0zl
var_dump(base_convert(PHP_INT_MAX * 2, 10, 36));             // 1z141z2
var_dump(base_convert(PHP_INT_MAX * 2 + 1, 10, 36));         // 1z141z3
var_dump(base_convert(PHP_INT_MAX * 2 + 2, 10, 36));         // 1z141z4

I got kinda lost with my rationalization here and I'm bored so I'll just finish really quick. We can use pretty much the whole base 36 charset and safely generate sequential codes with a minimum guaranteed uniqueness factor of 1 unique code per second for 3.16887646 years using this:

base_convert(mt_rand(22, 782) . substr(time(), 2), 10, 36);

I just realized that the above can sometimes return duplicated values due to the first argument of mt_rand(), in order to produce unique results we need to limit a our base 36 charset a little bit:

base_convert(mt_rand(122, 782) . substr(time(), 2), 10, 36);

Remember that the above values are still sequential, in order to make them look random we can use microtime() but we can only ensure a uniqueness factor of 10 codes per second for 3.8 months:

base_convert(mt_rand(122, 782) . substr(number_format(microtime(true), 1, '', ''), 3), 10, 36);

This proved to be more difficult than I originally antecipated since there are lot of constrains:

  • use the whole base 36 charset
  • generate random-looking codes
  • trade-offs between uniqueness factor per second and durability of uniqueness
  • PHP integer limitations

If we can ignore any of the above it would be a lot easier and I'm sure this can be further optimized but like I said: this is boring me. Maybe someone would like to pick this up where I left. =) I'm hungry! =S

Alix Axel
A: 
md5( microtime() );
David Morrow
A: 

Galen's answer allows only one use of each character in the password. Not much information in that string. A simple change though:

$chars = 'abcdefghijklmnopqrstuvwxyz0123456789'; $passwordlength = 7; for ($x = 1; $x <= $passwordlength; $x++) { $charlist .= $chars; } $temp_pw = substr( str_shuffle( $charlist ), 0, $passwordlength );

Mike