ansaurus

Question

Python: Ignore 'Incorrect padding' error when base64 decoding

Answer 1

+2 A:

If there's a padding error it probably means your string is corrupted; base64-encoded strings should have a multiple of four length. You can try adding the padding character (=) yourself to make the string a multiple of four, but it should already have that unless something is wrong

Michael Mrozek 2010-05-31 07:32:50

The underlying binary data is ASN.1. Even with corruption I want to get back to the binary because I can still get some useful info from the ASN.1 stream.

swisstony 2010-05-31 07:57:45

Answer 2

A:

Just add padding as required. Heed Michael's warning, however.

b64_string += "=" * ((4 - len(b64_string) % 4) % 4) #ugh

badp 2010-05-31 07:37:09

There's surely something simpler that maps 0 to 0, 2 to 1 and 1 to 2.

badp 2010-05-31 07:37:30

Why are you expanding to a multiple of 3 instead of 4?

Michael Mrozek 2010-05-31 07:43:16

That's what the wikipedia article on base64 seems to imply.

badp 2010-05-31 08:55:40

@bp: In base64 encoding each 24 bits (3 bytes) binary input is encoded as 4 bytes output. output_len % 3 makes no sense.

John Machin 2010-05-31 10:10:24

Whoops, I must have misread. Thanks @John :)

badp 2010-05-31 10:25:49

Answer 3

+3 A:

"Incorrect padding" can mean not only "missing padding" but also (believe it or not) "incorrect padding".

If suggested "adding padding" methods don't work, try removing some trailing bytes:

lens = len(strg)
lenx = lens - (lens % 4 if lens % 4 else 4)
try:
    result = base64.decodestring(strg[:lenx])
except etc

Update: Any fiddling around adding padding or removing possibly bad bytes from the end should be done AFTER removing any whitespace, otherwise length calculations will be upset.

It would be a good idea if you showed us a (short) sample of the data that you need to recover. Edit your question and copy/paste the result of print repr(sample).

Update 2: It is possible that the encoding has been done in an url-safe manner. If this is the case, you will be able to see minus and underscore characters in your data, and you should be able to decode it by using base64.b64decode(strg, '-_')

If you can't see minus and underscore characters in your data, but can see plus and slash characters, then you have some other problem, and may need the add-padding or remove-cruft tricks.

If you can see none of minus, underscore, plus and slash in your data, then you need to determine the two alternate characters; they'll be the ones that aren't in [A-Za-z0-9]. Then you'll need to experiment to see which order they need to be used in the 2nd arg of base64.b64decode()

Update 3: If your data is "company confidential":
(a) you should say so up front
(b) we can explore other avenues in understanding the problem, which is highly likely to be related to what characters are used instead of + and / in the encoding alphabet, or by other formatting or extraneous characters.

One such avenue would be to examine what non-"standard" characters are in your data, e.g.

from collections import defaultdict
d = defaultdict(int)
import string
s = set(string.ascii_letters + string.digits)
for c in your_data:
   if c not in s:
      d[c] += 1
print d

John Machin 2010-05-31 07:49:48

The data is comprised from the standard base64 character set. I'm pretty sure the problem is because 1 or more characters are missing - hence the padding error. Unless, there is a robust solution in Python, I'll go with my solution of calling openssl.

swisstony 2010-06-02 13:13:37

A "solution" that silently ignores errors is scarcely deserving of the term "robust". As I mentioned earlier, the various Python suggestions were methods of DEBUGGING to find out what the problem is, preparatory to a PRINCIPLED solution ... aren't you interested in such a thing?

John Machin 2010-06-02 13:32:32

My requirement is NOT to solve the problem of why the base64 is corrupt - it comes from a source I have no control over. My requirement is to provide information about the data received even if it is corrupt. One way to do this is to get the binary data out of the corrupt base64 so I can glean information from the underlying ASN.1. stream. I asked the original question because I wanted an answer to that question not the answer to another question - such as how to debug corrupt base64.

swisstony 2010-06-02 14:01:06

ansaurus

tags:

views:

answers:

Python: Ignore 'Incorrect padding' error when base64 decoding

related questions