tags:

views:

672

answers:

1

Running into strangeness with get_payload: it seems to crap out when it sees an equal sign in the message it's decoding. Here's code that displays the error:

import email

data = file('testmessage.txt').read()
msg  = email.message_from_string( data )
payload = msg.get_payload(decode=True)
print payload

And here's a sample message: test message.

The message is printed only until the first "=" . The rest is omitted. Anybody know what's going on?

The same script with "decode=False" returns the full message, so it appears the decode is unhappy with the equal sign.

This is under Python 2.5 .

+1  A: 

You have a line endings problem. The body of your test message uses bare carriage returns (\r) without newlines (\n). If you fix up the line endings before parsing the email, it all works:

import email, re
data = file('testmessage.txt').read()
data = re.sub(r'\r(?!=\n)', '\r\n', data)  # Bare \r becomes \r\n
msg  = email.message_from_string( data )
payload = msg.get_payload(decode=True)
print payload
RichieHindle
Thanks Richie, that works. However, I'll also be dealing with attachments that are not text, so I probably don't want to do the re substitution indiscriminately. I'll need to detect text/plain and only do the substitution then, which is a bit subtle since by the time I see the mime type for the message part I'm already past the message_from_string call. Is it possible to call decode separately outside of the get_payload call?
Parand
Are you sure you'll ever be dealing with true binary attachments? Attachments are usually encoded within the email using base64 or similar, so although they represent a binary file, they're encoded as text within the email.
RichieHindle
You're right again; I tested with a decent number of examples and they all work fine with the substitution you suggest. Thanks again.
Parand