views:

120

answers:

5

I have a big number, which I need to split into smaller numbers in Python. I wrote the following code to swap between the two:


def split_number (num, part_size):
    string = str(num)
    string_size = len(string)

    arr = []
    pointer = 0 
    while pointer < string_size:
        e = pointer + part_size
        arr.append(int(string[pointer:e]))
        pointer += part_size
    return arr 

def join_number(arr):
    num = ""
    for x in arr:
        num += str(x)
    return int(num)

But the number comes back different. It's hard to debug because the number is so large so before I go into that I thought I would post it here to see if there is a better way to do it or whether I'm missing something obvious.

Thanks a lot.

+2  A: 

Clearly, any leading 0s in the "parts" can't be preserved by this operation. Can't join_number also receive the part_size argument, so that it can reconstruct the string formats with all the leading zeros?

Without some information such as part_size that's known to both the sender and receiver, or the equivalent (such as the base number to use for a similar split and join based on arithmetic, roughly equivalent to 10**part_size given the way you're using part_size), the task becomes quite a bit harder. If the receiver is initially clueless about this, why not just place the part_size (or base, etc) as the very first int in the arr list that's being sent and received? That way, the encoding trivially becomes "self-sufficient", i.e., doesn't need any supplementary parameter known to both sender and receiver.

Alex Martelli
Ahh, leading zeroes, of course. It is possible to make the first integer in the array the part_size.Thanks a lot for your help, I don't believe I missed that.
Reality
+1  A: 

You should think of the following number split into 3-sized chunks:

1000005 -> 100 000 5

You have two problems. The first is that if you put those integers back together, you'll get:

100 0 5 -> 100005

(i.e., the middle one is 0, not 000) which is not what you started with. Second problem is that you're not sure what size the last part should be.

I would ensure that you're first using a string whose length is an exact multiple of the part size so you know exactly how big each part should be:

def split_number (num, part_size):
    string = str(num)
    string_size = len(string)
    while string_size % part_size != 0:
        string = "0%s"%(string)
        string_size = string_size + 1

    arr = []
    pointer = 0
    while pointer < string_size:
        e = pointer + part_size
        arr.append(int(string[pointer:e]))
        pointer += part_size
    return arr

Secondly, make sure that you put the parts back together with the right length for each part (ensuring you don't put leading zeros on the first part of course):

def join_number(arr, part_size):
    fmt_str = "%%s%%0%dd"%(part_size)
    num = arr[0]
    for x in arr[1:]:
        num = fmt_str%(num,int(x))
    return int(num)

Tying it all together, the following complete program:

#!/usr/bin/python

def split_number (num, part_size):
    string = str(num)
    string_size = len(string)
    while string_size % part_size != 0:
        string = "0%s"%(string)
        string_size = string_size + 1

    arr = []
    pointer = 0
    while pointer < string_size:
        e = pointer + part_size
        arr.append(int(string[pointer:e]))
        pointer += part_size
    return arr

def join_number(arr, part_size):
    fmt_str = "%%s%%0%dd"%(part_size)
    num = arr[0]
    for x in arr[1:]:
        num = fmt_str%(num,int(x))
    return int(num)

x = 1000005
print x
y = split_number(x,3)
print y
z = join_number(y,3)
print z

produces the output:

1000005
[1, 0, 5]
1000005

which shows that it goes back together.

Just keep in mind I haven't done Python for a few years. There's almost certainly a more "Pythonic" way to do it with those new-fangled lambdas and things (or whatever Python calls them) but, since your code was of the basic form, I just answered with the minimal changes required to get it working. Oh yeah, and be wary of negative numbers :-)

paxdiablo
num = `'%s%*d' % (num, part_size, int(x))`
Ignacio Vazquez-Abrams
A: 

I have updated the code to be as the following:


def split_number (num, part_size):
    string = str(num)
    string_size = len(string)

    arr = []
    pointer = 0
    while pointer < string_size:
        e = pointer + part_size
        arr.append(int(string[pointer:e]))
        pointer += part_size
    return arr

def join_number(arr, part_size):
    num = ""
    for x in arr:
        this_num = str(x)
        if len(this_num) < part_size:
            lead = ""
            for p in range(0,(part_size - len(this_num))):
                lead += "0"
            this_num = lead + this_num
        num += this_num

    return int(num)

but it's still losing some somewhere.

Reality
Ahh the issue is the last element.
Reality
A: 

There is no need to convert to and from strings, which can be very time consuming for really large numbers

>>> def split_number(n, part_size):
...     base = 10**part_size
...     L = []
...     while n:
...         n,part = divmod(n,base)
...         L.append(part)
...     return L[::-1]
... 
>>> def join_number(L, part_size):
...     base = 10**part_size
...     n = 0
...     L = L[::-1]
...     while L:
...         n = n*base+L.pop()
...     return n
... 
>>> print split_number(1000005,3)
[1, 0, 5]
>>> print join_number([1,0,5],3)
1000005
>>> 

Here you can see that just converting the number to a str takes longer than my entire function!

>>> from time import time
>>> t=time();b = split_number(2**100000,3000);print time()-t
0.204252004623
>>> t=time();b = split_number(2**100000,30);print time()-t
0.486856222153    
>>> t=time();b = str(2**100000);print time()-t
0.730905056
gnibbler
A: 

Here's some code for Alex Martelli's answer.

def digits(n, base):
    while n:
        yield n % base
        n //= base

def split_number(n, part_size):
    base = 10 ** part_size
    return list(digits(n, base))

def join_number(digits, part_size):
    base = 10 ** part_size
    return sum(d * (base ** i) for i, d in enumerate(digits))
Paul Hankin