views:

82

answers:

4

Suppose you have a MD5 hash encoded in base64. Then each character needs only 6 bits to store each character in the resultant 22-byte string (excluding the ending '=='). Thus, each base64 md5 hash can shrink down to 6*22 = 132 bits, which requires 25% less memory space compared to the original 8*22=176 bits string.

Is there any Python module or function that lets you store base64 data in the way described above?

+5  A: 

The most efficient way to store base64 encoded data is to decode it and store it as binary. base64 is a transport encoding - there's no sense in storing data in it, especially in memory, unless you have a compelling reason otherwise.

Also, nitpick: The output of a hash function is not a hex string - that's just a common representation. The output of a hash function is some number of bytes of binary data. If you're using the md5, sha, or hashlib modules, for example, you don't need to encode it as anything in the first place - just call .digest() instead of .hexdigest() on the hash object.

Nick Johnson
+3  A: 

Simply decode the base64 data to binary:

>>> b64 = "COIC09jwcwjiciOEIWIUNIUNE9832iun"
>>> len(b64)
32
>>> b = b64.decode("base64")
>>> b
'\x08\xe2\x02\xd3\xd8\xf0s\x08\xe2r#\x84!b\x144\x85\r\x13\xdf7\xda+\xa7'
>>> len(b)
24
Ned Batchelder
+1  A: 

"store base64 data"

Don't.

Do. Not. Store. Base64. Data.

Base64 is built by encoding something to make it bigger.

Store the original something. Never store the base64 encoding of something.

S.Lott
A: 

David gave an answer that works on all base64 strings.

Just use

base64.decodestring
in base64 module. That is,

import base64
binary = base64.decodestring(base64_string)

is a more memory efficient representation of the original base64 string. If you are truncating trailing '==' in your base64 md5, use it like

base64.decodestring(md5+'==')
OTZ