ansaurus

Question

Efficient arbitrary-sized integer packing in Python

Answer 1

+1 A:

As suggested by S.Lott in a comment, just convert the number to a string and pack that string. For example,

x = 2 ** 12345
struct.pack("40s", str(x))

Eli Courtwright 2009-04-22 14:46:59

That is even less efficient than just storing the string representation of the integer, due to padding the int with null bytes.

Mike Boers 2009-04-22 15:32:01

The question is unclear what type of efficiency is required? Are we looking for the most efficient use of space? Or perhaps the least processing time to pack the struct? For that matter, is efficiency even a big deal? The question asked for the most efficient answer, but people often optimize prematurely.

Eli Courtwright 2009-04-22 17:29:03

Answer 2

+2 A:

Assuming the poster wants to pack a large integer as a binary string, i.e. not use one byte of storage per digit in the number. One way of doing this seems to be:

import marshal

a = 47L
print marshal.dumps(a)

This prints:

'l\x01\x00\x00\x00/\x00'

I can't say that I understand how to interpret these bits, right now ...

unwind 2009-04-22 14:52:01

That is going in the right direction, but it still has two redundant bytes at the front.

Mike Boers 2009-04-22 15:35:50

@Mike: Actually more than that - I think the "l" and first 4 digits are just a count of the content, followed by a single byte ("/" == chr(47)) and a null at the end. It also looks like marshal is doing a more complex encoding scheme just including the raw bytes (look at dumps(2**64-1) for example and not the 0x7f bytes in the middle.

Brian 2009-04-22 15:56:28

Answer 3

A:

I take it you mean you only want to use as many bytes as you need to represent the number? e.g. if the number is:

255 or less you'd use only 1 byte
65535 or less 2 bytes
16777215 or less 3 bytes
etc etc

On the Psion PDA they'd usually have some of packing scheme in which you read the first byte, detect if it has the highest bit set and then read another byte if it has. That way you'd just keep reading bytes until you read the "full" number. That system works quite well if most of the numbers you are dealing with are fairly small, as you'll normally only use one or two bytes per number.

The alternative is to have one (or more) bytes representing the number of total bytes used, but at that point it's basically a string in Python anyway. i.e. it's a string of base-256 digits.

John Montgomery 2009-04-22 15:12:39

Answer 4

+2 A:

Do you mean something like this:

def num_to_bytes(num):
    bytes = []
    num = abs(num) # Because I am unsure about negatives...
    while num > 0:
        bytes.append(chr(num % 256))
        num >>= 8
    return ''.join(reversed(bytes))

def bytes_to_num(bytes):
    num = 0
    for byte in bytes:
        num <<= 8
        num += ord(byte)
    return num

for n in (1, 16, 256, 257, 1234567890987654321):
    print n,
    print num_to_bytes(n).encode('hex'),
    print bytes_to_num(num_to_bytes(n))

Which returns:

1 01 1
16 10 16
256 0100 256
257 0101 257
1234567890987654321 112210f4b16c1cb1 1234567890987654321

I'm just not sure what to do about negatives... I'm not that familiar with bit twidling.

EDIT: Another solution (which runs about 30% faster by my tests):

def num_to_bytes(num):
    num = hex(num)[2:].rstrip('L')
    if len(num) % 2:
        return ('0%s' % num).decode('hex')
    return num.decode('hex')

def bytes_to_num(bytes):
    return int(bytes.encode('hex'), 16)

Mike Boers 2009-04-22 15:18:32

Yes, this is what I mean, and I wrote such a function a while ago. However I wanted something that's easier to "port", like a list-comprehension trick or (better) a library function I just didn't know about.

noamtm 2009-04-22 15:38:53

I just posted another go with built in int/hex transcoding and hex/str transconding... It goes a little faster too!

Mike Boers 2009-04-22 15:49:15

@noamtm: Please update the question with additional facts. Don't just hide information in comments.

S.Lott 2009-04-22 19:58:31

Answer 5

+1 A:

This is a bit hacky, but you could go via the hex string representation, and there to binary with the hex codec:

>>> a = 2**60
>>> a
1152921504606846976L
>>> hex(a)
'0x1000000000000000L'
>>> hex(a).rstrip("L")[2:].decode('hex')
'\x10\x00\x00\x00\x00\x00\x00\x00'       # 8bytes, as expected.

>>> int(_.encode('hex'), 16)
1152921504606846976L

It breaks a little because the hex codec requires an even number of digits, so you'll need to pad for that, and you'll need to set a flag to handle negative numbers. Here's a generic pack / unpack:

def pack(num):
    if num <0:
       num = (abs(num) << 1) | 1    # Hack - encode sign as lowest bit.
    else:
       num = num << 1
    hexval = hex(num).rstrip("L")[2:]
    if len(hexval)%2 ==1: hexval = '0' + hexval
    return hexval.decode('hex')

def unpack(s):
    val = int(s.encode('hex'), 16)
    sign = -1 if (val & 1) else 1
    return sign * (val>>1)


for i in [10,4534,23467, 93485093485, 2**50, 2**60-1, -1, -20, -2**60]:
    assert unpack(pack(i)) == i

With all the fiddling for padding etc required, I'm not sure it's much better than a hand-rolled solution though.

Brian 2009-04-22 16:17:09

ansaurus

tags:

views:

answers:

Efficient arbitrary-sized integer packing in Python

related questions