views:

907

answers:

5

Hello,

I am using an api which takes a name of 21 char max to represent an internal session which has a lifetime of around "two days". I would like the name not to be meaningfull using some kind of hasing ? md5 generates 40 chars, is there something else i could use ?

For now i use 'userid[:10]' + creation time: ddhhmmss + random 3 chars.

Thanks,

+3  A: 

Why not take first 21 chars from md5 or SHA1 hash?

Alex Lebedev
True that should be quite random enough
coulix
Something like hashlib.md5(str(random.random())).hexdigest()[:21]
S.Lott
random.random() by default gets its seed from os.urandom, else from time.time. Assuming the OS supports os.urandom, might as well do os.urandom(11).encode("hex")[:21] .
Andrew Dalke
+2  A: 

The hexadecimal representation of MD5 has very poor randomness: you only get 4 bits of entropy per character.

Use random characters, something like:

import random
import string
"".join([random.choice(string.ascii_letters + string.digits + ".-")
        for i in xrange(21)])

In the choice put all the acceptable characters.

While using a real hash function such as SHA1 will also get you nice results if used correctly, the added complexity and CPU consumption seems not justified for your needs. You only want a random string.

kmkaplan
import string; print string.letters + string.digits + ".-"
pi
@Pi: edited to do this.
kmkaplan
string.ascii_letters, since string.letters is locale dependent.
Andrew Dalke
A: 

Characters, or bytes? If it takes arbitrary strings, you can just use the bytes and not worry about expanding to readable characters (for which base64 would be better than hex anyway).

MD5 generates 16 chars if you don't use the hexadecimal expansion of it. SHA1 generates 20 under the same condition.

>>> import hashlib
>>> len(hashlib.md5('foobar').digest())
16
>>> len(hashlib.sha1('foobar').digest())
20

Few extra bytes are needed after that.

Devin Jeanpierre
+8  A: 

If I read your question correctly, you want to generate some arbitrary identifier token which must be 21 characters max. Does it need to be highly resistant to guessing? The example you gave isn't "crytographically strong" in that it can be guessed by searching well less than 1/2 of the entire possible keyspace.

You don't say if the characters can be all 256 ASCII characters, or if it needs to be limited to, say, printable ASCII (33-127, inclusive), or some smaller range.

There is a Python module designed for UUIDs (Universals Unique IDentifiers). You likely want uuid4 which generates a random UUID, and uses OS support if available (on Linux, Mac, FreeBSD, and likely others).

>>> import uuid
>>> u = uuid.uuid4()
>>> u
UUID('d94303e7-1be4-49ef-92f2-472bc4b4286d')
>>> u.bytes
'\xd9C\x03\xe7\x1b\xe4I\xef\x92\xf2G+\xc4\xb4(m'
>>> len(u.bytes)
16
>>>

16 random bytes is very unguessable, and there's no need to use the full 21 bytes your API allows, if all you want is to have an unguessable opaque identifier.

If you can't use raw bytes like that, which is probably a bad idea because it's harder to use in logs and other debug messages and harder to compare by eye, then convert the bytes into something a bit more readable, like using base-64 encoding, with the result chopped down to 21 (or whatever) bytes:

>>> u.bytes.encode("base64")
'2UMD5xvkSe+S8kcrxLQobQ==\n'
>>> len(u.bytes.encode("base64")) 
25
>>> u.bytes.encode("base64")[:21]
'2UMD5xvkSe+S8kcrxLQob'
>>>

This gives you an extremely high quality random string of length 21.

You might not like the '+' or '/' which can be in a base-64 string, since without proper escaping that might interfere with URLs. Since you already think to use "random 3 chars", I don't think this is a worry of yours. If it is, you could replace those characters with something else ('-' and '.' might work), or remove them if present.

As others have pointed out, you could use .encode("hex") and get the hex equivalent, but that's only 4 bits of randomness/character * 21 characters max gives you 84 bits of randomness instead of twice that. Every bit doubles your keyspace, making the theoretical search space much, much smaller. By a factor of 2E24 smaller.

Your keyspace is still 2E24 in size, even with hex encoding, so I think it's more a theoretical concern. I wouldn't worry about people doing brute force attacks against your system.

Edit:

P.S.: The uuid.uuid4 function uses libuuid if available. That gets its entropy from os.urandom (if available) otherwise from the current time and the local ethernet MAC address. If libuuid is not available then the uuid.uuid4 function gets the bytes directly from os.urandom (if available) otherwise it uses the random module. The random module uses a default seed based on os.urandom (if available) otherwise a value based on the current time. Probing takes place for every function call, so if you don't have os.urandom then the overhead is a bit bigger than you might expect.

Take home message? If you know you have os.urandom then you could do

os.urandom(16).encode("base64")[:21]

but if you don't want to worry about its availability then use the uuid module.

Andrew Dalke
I forgot to add that is should be url safe, i should have specified it in the three random chars. I will use your method and replace the + and / chars.
coulix
I found a uri_b64encode safe method wich would do the job nicely thanks
coulix
Note that UUID4 don’t exactly give you 16 random bytes. There are 6 fixed (non random) bits. Of course this is plenty sufficient.
kmkaplan
Really? I looked at uuid.py and it seems to give 16 random bytes ... Ah-ha! The constructor does some bit fiddling based on the version number. That was downstream from where I looked. Thanks for the correction kmkaplan.
Andrew Dalke
A: 

The base64 module can do URL-safe encoding. So, if needed, instead of

u.bytes.encode("base64")

you could do

import base64

token = base64.urlsafe_b64encode(u.bytes)

and, conveniently, to convert back

u = uuid.UUID(bytes=base64.urlsafe_b64decode(token))
Neil