views:

95

answers:

4

I wrote this python code in an attempt to convert objects to a string of ones and zeros, but the decoding fails because the data can't be unpickled. This is the code:

def encode(obj):
    'convert an object to ones and zeros'
    def tobin(str):
        rstr = ''
        for f in str:
            if f == "0": rstr += "0000"
            elif f == "1": rstr += "0001"
            elif f == "2": rstr += "0010"
            elif f == "3": rstr += "0100"
            elif f == "4": rstr += "1000"
            elif f == "5": rstr += "1001"
            elif f == "6": rstr += "1010"
            elif f == "7": rstr += "1100"
            elif f == "8": rstr += "1101"
            elif f == "9": rstr += "1110"
            else: rstr += f
        return rstr
    import pickle, StringIO
    f = StringIO.StringIO()
    pickle.dump(obj, f)
    data = f.getvalue()
    import base64
    return tobin(base64.b16encode(base64.b16encode(data)))
def decode(data):
    def unbin(data):
        rstr = ''
        for f in data:
            if f == "0000": rstr += "0"
            elif f == "0001": rstr += "1"
            elif f == "0010": rstr += "2"
            elif f == "0100": rstr += "3"
            elif f == "1000": rstr += "4"
            elif f == "1001": rstr += "5"
            elif f == "1010": rstr += "6"
            elif f == "1100": rstr += "7"
            elif f == "1101": rstr += "8"
            elif f == "1110": rstr += "9"
        return rstr
    import base64
    ndata = base64.b16decode(base64.b16decode(unbin(data)))
    import pickle, StringIO
    f = StringIO.StringIO(ndata)
    obj = pickle.load(f)
    return obj
+2  A: 

I think there are several problems, but one is that when you decode, you need to iterate through groups of 4 characters in you unbin() function, not single characters like you are currently doing.

Justin Peel
Thanks. I fixed it so it iterates through 4 characters and now it works fine.
JoeBob
A: 

Your bin and unbin functions aren't inverses of each other, because bin has an else clause that just puts the characters verbatim into the output, but unbin has no else clause to pass them back.

Ned Batchelder
The else there should be made to throw exception, since it is by design unreachable - base64.b16encode(base64.b16encode()) ensures there are only digits
Nas Banov
Last I looked, b16encode used 0-9 and A-F.
Ned Batchelder
+1  A: 

I think I have a better solution for you. This should be even more secure, since it "encrypts" everything, not just numbers:

MAGIC = 0x15 # CHOOSE ANY TWO HEX DIGITS YOU LIKE

# THANKS TO NAS BANOV FOR THE FOLLOWING:
unbin = tobin = lambda s: ''.join(chr(ord(c) ^ MAGIC) for c in s)
jdmichal
optimization `unbin = tobin` instead of `def unbin` :-D
Nas Banov
@Nas Banov I was wondering if you could do that. I don't use python at all and was pretty much just copying syntax from the asker. :)
jdmichal
@jdmichal: Yeah, you can. But now i notice something in your code - you can't use `^` on a string. And since we are getting kinky, here is replacement is suggest: `unbin = tobin = lambda s: ''.join(chr(ord(c) ^ MAGIC) for c in s)`
Nas Banov
@Nas Banov Ah I assumed the for would be throwing off characters, not strings. Thanks for the corrections.
jdmichal
A: 

By the way... base64.b16encode(base64.b16encode(data)) is equivalent to data.encode('hex').encode('hex'). And there is simpler and faster way to do the mapping,

def tobin(numStr):
    return ''.join(("0000","0001","0010","0100","1000","1001","1010","1100","1101","1110")[int(c)] for c in numStr)

The whole idea of this encoding, while seeming complicated on the surface, is not very good. First, it does not do much of encryption, since each digit from the hex dump gets matched always to the same 8-length string of 0 and 1s:

>>> hexd = '0123456789ABCDEF'
>>> s = hexd.encode('hex')
>>> s
'30313233343536373839414243444546'
>>> s=''.join(["0000","0001","0010","0100","1000","1001","1010","1100","1101","1110"][int(c)] for c in s)
>>> s
'01000000010000010100001001000100010010000100100101001010010011000100110101001110100000011000001010000100100010001000100110001010'
>>> for i in range(0,len(s),8):
...     print hexd[i/8], s[i:i+8], chr(int(s[i:i+8],2))
... 
0 01000000 @
1 01000001 A
2 01000010 B
3 01000100 D
4 01001000 H
5 01001001 I
6 01001010 J
7 01001100 L
8 01001101 M
9 01001110 N
A 10000001 
B 10000010 ‚
C 10000100 „
D 10001000 ˆ
E 10001001 ‰
F 10001010 Š

Secondly, it blows up the size of the pickled object 16 times! Even if you pack this by converting every 8 bits of '0' and '1' to bytes (say chr(int(encoded[i:i+8],2))), that still is 2x the pickle.

Nas Banov