tags:

views:

418

answers:

5

The struct.pack() function allows converting integers of up to 64 bit to byte strings. What's the most efficient way to pack an even larger integer? I'd rather not add a dependency on non-standard modules like PyCrypto (which provides num_to_bytes()).

+1  A: 

As suggested by S.Lott in a comment, just convert the number to a string and pack that string. For example,

x = 2 ** 12345
struct.pack("40s", str(x))
Eli Courtwright
That is even less efficient than just storing the string representation of the integer, due to padding the int with null bytes.
Mike Boers
The question is unclear what type of efficiency is required? Are we looking for the most efficient use of space? Or perhaps the least processing time to pack the struct? For that matter, is efficiency even a big deal? The question asked for the most efficient answer, but people often optimize prematurely.
Eli Courtwright
+2  A: 

Assuming the poster wants to pack a large integer as a binary string, i.e. not use one byte of storage per digit in the number. One way of doing this seems to be:

import marshal

a = 47L
print marshal.dumps(a)

This prints:

'l\x01\x00\x00\x00/\x00'

I can't say that I understand how to interpret these bits, right now ...

unwind
That is going in the right direction, but it still has two redundant bytes at the front.
Mike Boers
@Mike: Actually more than that - I think the "l" and first 4 digits are just a count of the content, followed by a single byte ("/" == chr(47)) and a null at the end. It also looks like marshal is doing a more complex encoding scheme just including the raw bytes (look at dumps(2**64-1) for example and not the 0x7f bytes in the middle.
Brian
A: 

I take it you mean you only want to use as many bytes as you need to represent the number? e.g. if the number is:

  • 255 or less you'd use only 1 byte
  • 65535 or less 2 bytes
  • 16777215 or less 3 bytes
  • etc etc

On the Psion PDA they'd usually have some of packing scheme in which you read the first byte, detect if it has the highest bit set and then read another byte if it has. That way you'd just keep reading bytes until you read the "full" number. That system works quite well if most of the numbers you are dealing with are fairly small, as you'll normally only use one or two bytes per number.

The alternative is to have one (or more) bytes representing the number of total bytes used, but at that point it's basically a string in Python anyway. i.e. it's a string of base-256 digits.

John Montgomery
+2  A: 

Do you mean something like this:

def num_to_bytes(num):
    bytes = []
    num = abs(num) # Because I am unsure about negatives...
    while num > 0:
        bytes.append(chr(num % 256))
        num >>= 8
    return ''.join(reversed(bytes))

def bytes_to_num(bytes):
    num = 0
    for byte in bytes:
        num <<= 8
        num += ord(byte)
    return num

for n in (1, 16, 256, 257, 1234567890987654321):
    print n,
    print num_to_bytes(n).encode('hex'),
    print bytes_to_num(num_to_bytes(n))

Which returns:

1 01 1
16 10 16
256 0100 256
257 0101 257
1234567890987654321 112210f4b16c1cb1 1234567890987654321

I'm just not sure what to do about negatives... I'm not that familiar with bit twidling.

EDIT: Another solution (which runs about 30% faster by my tests):

def num_to_bytes(num):
    num = hex(num)[2:].rstrip('L')
    if len(num) % 2:
        return ('0%s' % num).decode('hex')
    return num.decode('hex')

def bytes_to_num(bytes):
    return int(bytes.encode('hex'), 16)
Mike Boers
Yes, this is what I mean, and I wrote such a function a while ago. However I wanted something that's easier to "port", like a list-comprehension trick or (better) a library function I just didn't know about.
noamtm
I just posted another go with built in int/hex transcoding and hex/str transconding... It goes a little faster too!
Mike Boers
@noamtm: Please update the question with additional facts. Don't just hide information in comments.
S.Lott
+1  A: 

This is a bit hacky, but you could go via the hex string representation, and there to binary with the hex codec:

>>> a = 2**60
>>> a
1152921504606846976L
>>> hex(a)
'0x1000000000000000L'
>>> hex(a).rstrip("L")[2:].decode('hex')
'\x10\x00\x00\x00\x00\x00\x00\x00'       # 8bytes, as expected.

>>> int(_.encode('hex'), 16)
1152921504606846976L

It breaks a little because the hex codec requires an even number of digits, so you'll need to pad for that, and you'll need to set a flag to handle negative numbers. Here's a generic pack / unpack:

def pack(num):
    if num <0:
       num = (abs(num) << 1) | 1    # Hack - encode sign as lowest bit.
    else:
       num = num << 1
    hexval = hex(num).rstrip("L")[2:]
    if len(hexval)%2 ==1: hexval = '0' + hexval
    return hexval.decode('hex')

def unpack(s):
    val = int(s.encode('hex'), 16)
    sign = -1 if (val & 1) else 1
    return sign * (val>>1)


for i in [10,4534,23467, 93485093485, 2**50, 2**60-1, -1, -20, -2**60]:
    assert unpack(pack(i)) == i

With all the fiddling for padding etc required, I'm not sure it's much better than a hand-rolled solution though.

Brian