views:

4484

answers:

4

I've been trying to find a more pythonic way of generating random string in python that can scale as well. Typically, I see something similar to

''.join(random.choice(string.letters) for i in xrange(len))

It sucks if you want to generate long string.

I've been thinking about random.getrandombits for a while, and figuring out how to convert that to an array of bits, then hex encode that. Using python 2.6 I came across the bitarray object, which isn't documented. Somehow I got it to work, and it seems really fast.

It generates a 50mil random string on my notebook in just about 3 seconds.

def rand1(leng):
    nbits = leng * 6 + 1
    bits = random.getrandbits(nbits)
    uc = u"%0x" % bits
    newlen = int(len(uc) / 2) * 2 # we have to make the string an even length
    ba = bytearray.fromhex(uc[:newlen])
    return base64.urlsafe_b64encode(str(ba))[:leng]


edit

heikogerlach pointed out that it was an odd number of characters causing the issue. New code added to make sure it always sent fromhex an even number of hex digits.

Still curious if there's a better way of doing this that's just as fast.

+16  A: 
import os
random_string = os.urandom(string_length)
Seun Osewa
Ah! So simple. I didn't think it was cross-platform, but apparently it is.
mikelikespie
Just a followup, it's really odd, but at least on OS X, the getrandbits method is 2-3x faster.
mikelikespie
That's probably because os.urandom will be a cryptographically secure PRNG (usually a stream cipher) while random is a "normal" PRNG which are usually way faster to calculate.
Joey
Is there a way to use this to generate ASCII strings rather than unicode? For example, so the string can be used in a URL.
Derek Dahmer
You could use random.choice, string.digits, and string.letters like the first example:>>> import random, string>>> ''.join(random.choice(string.letters + string.digits) for i in xrange(10))'FywhcRLmh1'(I'm assuming you aren't generating an enormous string like the op since it's for a URL...)
jgeewax
For URLs one may want to use `string.ascii_letters`.
jholster
@Derek: You can encode the random string in base64 for a url.
Seun Osewa
+2  A: 

It seems the fromhex() method expects an even number of hex digits. Your string is 75 characters long. Be aware that something[:-1] excludes the last element! Just use something[:].

unbeknown
There was a trailing L with the __hex__(). I rewrote the sample code. Anyways, I think you were right on with it requiring an even number of digits
mikelikespie
+2  A: 

Taken from the 1023290 bug report at Python.org:

junk_len = 1024
junk =  (("%%0%dX" % junk_len) % random.getrandbits(junk_len *
8)).decode("hex")

Also, see the issues 923643 and 1023290

fdr
+1  A: 

Regarding the last example, the following fix to make sure the line is even length, whatever the junk_len value:

junk_len = 1024
junk =  (("%%0%dX" % (junk_len * 2)) % random.getrandbits(junk_len * 8)).decode("hex")