views:

4437

answers:

3

Using the PHP pack() function, I have converted a string into a binary hex representation:

$string = md5(time); // 32 character length
$packed = pack('H*', $string);

The H* formatting means "Hex string, high nibble first".

To unpack this in PHP, I would simply use the unpack() function with the H* format flag.

How would I unpack this data in Python?

+7  A: 

In Python you use the struct module for this.

>>> from struct import *
>>> pack('hhl', 1, 2, 3)
'\x00\x01\x00\x02\x00\x00\x00\x03'
>>> unpack('hhl', '\x00\x01\x00\x02\x00\x00\x00\x03')
(1, 2, 3)
>>> calcsize('hhl')
8

HTH

Note : "h" means something different in struct than "nibble encoded as hex" - it refers to a 16 bit integer.
Brian
+6  A: 

There's no corresponding "hex nibble" code for struct.pack, so you'll either need to manually pack into bytes first, like:

hex_string = 'abcdef12'

hexdigits = [int(x, 16) for x in hex_string]
data = ''.join(struct.pack('B', (high <<4) + low) 
               for high, low in zip(hexdigits[::2], hexdigits[1::2]))

Or better, you can just use the hex codec. ie.

>>> data = hex_string.decode('hex')
>>> data
'\xab\xcd\xef\x12'

To unpack, you can encode the result back to hex similarly

>>> data.encode('hex')
'abcdef12'

However, note that for your example, there's probably no need to take the round-trip through a hex representation at all when encoding. Just use the md5 binary digest directly. ie.

>>> x = md5.md5('some string')
>>> x.digest()
'Z\xc7I\xfb\xee\xc96\x07\xfc(\xd6f\xbe\x85\xe7:'

This is equivalent to your pack()ed representation. To get the hex representation, use the same unpack method above:

>>> x.digest().decode('hex')
'acbd18db4cc2f85cedef654fccc4a4d8'
>>> x.hexdigest()
'acbd18db4cc2f85cedef654fccc4a4d8'

[Edit]: Updated to use better method (hex codec)

Brian
In the first version, is there anything special to import to use the group statement?
Leandro López
@Leandro: Oops - group() was a function in my own library (break a sequence into groups of N characters). I've updated the code to just use a slice to avoid the undefined function.
Brian
+4  A: 

There's an easy way to do this with the binascii module:

>>> import binascii
>>> print binascii.hexlify("ABCZ")
'4142435a'

Unless I'm misunderstanding something about the nibble ordering (high-nibble first is the default), that should be perfectly sufficient!

Furthermore, Python's hashlib.md5 objects have a hexdigest() method to automatically convert the MD5 digest to an ASCII hex string, so that this method isn't even necessary for MD5 digests. Hope that helps.

Dan