views:

136

answers:

3

I want to convert an integer (int or long) a big-endian byte string. The byte string has to be of variable length, so that only the minimum number of bytes are used (the total length length of the preceding data is known, so the variable length can be inferred).

My current solution is

import bitstring

bitstring.BitString(hex=hex(456)).tobytes()

Which obviously depends on the endianness of the machine and gives false results, because 0 bits are append and no prepended.

Does any one know a way to do this without making any assumption about the length or endianess of an int?

+5  A: 

Something like this. Untested (until next edit). For Python 2.x. Assumes n > 0.

tmp = []
while n:
    n, d = divmod(n, 256)
    tmp.append(chr(d))
result = ''.join(tmp[::-1])

Edit: tested.

If you don't read manuals but like bitbashing, instead of the divmod caper, try this:

d = n & 0xFF; n >>= 8

Edit 2: If your numbers are relatively small, the following may be faster:

result = ''
while n:
    result = chr(n & 0xFF) + result
    n >>= 8

Edit 3: The second method doesn't assume that the int is already bigendian. Here's what happens in a notoriously littleendian environment:

Python 2.7 (r27:82525, Jul  4 2010, 09:01:59) [MSC v.1500 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> n = 65539
>>> result = ''
>>> while n:
...     result = chr(n & 0xFF) + result
...     n >>= 8
...
>>> result
'\x01\x00\x03'
>>> import sys; sys.byteorder
'little'
>>>
John Machin
This assumes that 1 byte equals 8 bits. I don't know if you can make this assumption with regard to the python semantics.The second method assumes that the integer is already in big-endian.
ott
@ott: It's quite safe to say that 1 byte equals 8 bits, and Python integers themselves don't have endianness - it's only an issue in how they are stored or transmitted (i.e. it's only a problem if you've incorrectly unpacked `n` from somewhere before getting this far). Both methods look fine to me.
Scott Griffiths
Actually, it merely assumes that a byte is at *least* 8 bits, which is guaranteed by the C standard, and thus by the C PyBytes type.
dan04
John Machin
+1  A: 

A solution using struct and itertools:

>>> import itertools, struct
>>> "".join(itertools.dropwhile(lambda c: not(ord(c)), struct.pack(">i", 456))) or chr(0)
'\x01\xc8'

We can drop itertools by using a simple string strip:

>>> struct.pack(">i", 456).lstrip(chr(0)) or chr(0)
'\x01\xc8'

Or even drop struct using a recursive function:

def to_bytes(n): 
    return ([chr(n & 255)] + to_bytes(n >> 8) if n > 0 else [])

"".join(reversed(to_bytes(456))) or chr(0)
tokland
The `struct.pack` method doesn't work, because `struct.unpack` requires a fixed length. For the other methods you would also need a reverse function (trivial).
ott
+1  A: 

If you're using Python 2.7 or later then you can use the bit_length method to round the length up to the next byte:

>>> i = 456
>>> bitstring.BitString(uint=i, length=(i.bit_length()+7)/8*8).bytes
'\x01\xc8'

otherwise you can just test for whole-byteness and pad with a zero nibble at the start if needed:

>>> s = bitstring.BitString(hex=hex(i))
>>> ('0x0' + s if s.len%8 else s).bytes
'\x01\xc8'
Scott Griffiths
`bit_length` seems to be a clean solution (though I'm on Python 2.6 on Debian). `(i.bit_length()+7)/8*8` rounds up the length to a length that is dividable by 8, am I right?The endianness problem also still exists.
ott
I found an [explanation for the rounding](http://stackoverflow.com/questions/2403631/how-do-i-find-the-next-multiple-of-10-of-any-integer). So only the endianness problem remains.
ott
`uint` is an alias for `uintbe`, so the endianess problem is also solved.
ott
This was a bit more difficult than it needed to be, so I've added a feature request (http://code.google.com/p/python-bitstring/issues/detail?id=99) so hopefully in the next version you could say just `BitString(uintbe=456).bytes`. :)
Scott Griffiths