tags:

views:

405

answers:

3

I have some data that is base64 encoded that I want to convert back to binary even if there is a padding error in it. If I use

base64.decodestring(b64_string)

it raises an 'Incorrect padding' error. Is there another way?

UPDATE: Thanks for all the feedback. To be honest, all the methods mentioned sounded a bit hit and miss so I decided to try openssl. The following command worked a treat:

openssl enc -d -base64 -in b64string -out binary_data
+2  A: 

If there's a padding error it probably means your string is corrupted; base64-encoded strings should have a multiple of four length. You can try adding the padding character (=) yourself to make the string a multiple of four, but it should already have that unless something is wrong

Michael Mrozek
The underlying binary data is ASN.1. Even with corruption I want to get back to the binary because I can still get some useful info from the ASN.1 stream.
swisstony
A: 

Just add padding as required. Heed Michael's warning, however.

b64_string += "=" * ((4 - len(b64_string) % 4) % 4) #ugh
badp
There's surely something simpler that maps 0 to 0, 2 to 1 and 1 to 2.
badp
Why are you expanding to a multiple of 3 instead of 4?
Michael Mrozek
That's what the wikipedia article on base64 seems to imply.
badp
@bp: In base64 encoding each 24 bits (3 bytes) binary input is encoded as 4 bytes output. output_len % 3 makes no sense.
John Machin
Whoops, I must have misread. Thanks @John :)
badp
+3  A: 

"Incorrect padding" can mean not only "missing padding" but also (believe it or not) "incorrect padding".

If suggested "adding padding" methods don't work, try removing some trailing bytes:

lens = len(strg)
lenx = lens - (lens % 4 if lens % 4 else 4)
try:
    result = base64.decodestring(strg[:lenx])
except etc

Update: Any fiddling around adding padding or removing possibly bad bytes from the end should be done AFTER removing any whitespace, otherwise length calculations will be upset.

It would be a good idea if you showed us a (short) sample of the data that you need to recover. Edit your question and copy/paste the result of print repr(sample).

Update 2: It is possible that the encoding has been done in an url-safe manner. If this is the case, you will be able to see minus and underscore characters in your data, and you should be able to decode it by using base64.b64decode(strg, '-_')

If you can't see minus and underscore characters in your data, but can see plus and slash characters, then you have some other problem, and may need the add-padding or remove-cruft tricks.

If you can see none of minus, underscore, plus and slash in your data, then you need to determine the two alternate characters; they'll be the ones that aren't in [A-Za-z0-9]. Then you'll need to experiment to see which order they need to be used in the 2nd arg of base64.b64decode()

Update 3: If your data is "company confidential":
(a) you should say so up front
(b) we can explore other avenues in understanding the problem, which is highly likely to be related to what characters are used instead of + and / in the encoding alphabet, or by other formatting or extraneous characters.

One such avenue would be to examine what non-"standard" characters are in your data, e.g.

from collections import defaultdict
d = defaultdict(int)
import string
s = set(string.ascii_letters + string.digits)
for c in your_data:
   if c not in s:
      d[c] += 1
print d
John Machin
The data is comprised from the standard base64 character set. I'm pretty sure the problem is because 1 or more characters are missing - hence the padding error. Unless, there is a robust solution in Python, I'll go with my solution of calling openssl.
swisstony
A "solution" that silently ignores errors is scarcely deserving of the term "robust". As I mentioned earlier, the various Python suggestions were methods of DEBUGGING to find out what the problem is, preparatory to a PRINCIPLED solution ... aren't you interested in such a thing?
John Machin
My requirement is NOT to solve the problem of why the base64 is corrupt - it comes from a source I have no control over. My requirement is to provide information about the data received even if it is corrupt. One way to do this is to get the binary data out of the corrupt base64 so I can glean information from the underlying ASN.1. stream. I asked the original question because I wanted an answer to that question not the answer to another question - such as how to debug corrupt base64.
swisstony