views:

703

answers:

4

My application is written in python. What I am doing is I am running a script on each email received by postfix and do something with the email content. Procmail is responsible for running the script taking the email as input. The problem started when I was converting the input message(may be text) to email_message object(because the latter comes in handy). I am using email.message_from_string (where email is the default email module, comes with python).


import email
message = email.message_from_string(original_mail_content)
message_body = message.get_payload()

This message_body is sometimes returning a list[email.message.Message instance,email.message.Message instance] and sometime returning a string(actual body content of the incoming email). Why is it. And even I found one more observation. When I was browsing through the email.message.Message.get_payload() docstring, I found this..
""" The payload will either be a list object or a string.If you mutate the list object, you modify the message's payload in place....."""

So how do I have generic method to get the body of email through python? Please help me out.

+9  A: 

As crazy as it might seem, the reason for the sometimes string, sometimes list-semantics are given in the documentation. Basically, multipart messages are returned as lists.

unwind
+2  A: 

Well, the answers are correct, you should read the docs, but for an example of a generic way:

def get_first_text_part(msg):
    maintype = msg.get_content_maintype()
    if maintype == 'multipart':
        for part in msg.get_payload():
            if part.get_content_maintype() == 'text':
                return part.get_pay_load()
    elif maintype == 'text':
        return msg.get_payload()

This is prone to some disaster, as it is conceivable the parts themselves might have multiparts, and it really only returns the first text part, so this might be wrong too, but you can play with it.

Ali A
In the messages list I talked about, I tried running the get_payload() on each of the objects. Both return the same thing. Is one object kind of clone of the other, so that if I get the get_payload called on one single part will do???
Maddy
Depends what you've been sent. You might commonly, for example, get a text/html and a text/plain version of the same thing. You could modify the function to look for and prefer a text/plain content type over other text/ types.
bobince
Awesome bobince. You are absolutely right :D
Maddy
Also be aware that there's no "correct" structure for a message. Different mail clients will structure them differently, as well as set different headers.
Richard Levasseur
A: 

Rather than simply looking for a sub-part, use walk() to iterate through the message contents

def walkMsg(msg):
  for part in msg.walk():
    if part.get_content_type() = "multipart/alternative":
      continue
    yield part.get_payload(decode=1)

The walk() method returns an iterator that you can loop with (i.e. it's a generator). If the message is not a container of parts (i.e. has no attachments or alternates), the walk() method will then return an iterator with a single element - the message itself.

You want to skip any 'multipart' parts as they are just glue.

The above method returns all readable parts. You may want to expand this to simply return the text parts if they contain the info you are seeking.

Note that as of Python 2.5, methods get_type(), get_main_type(), and get_subtype() have been removed -> http://docs.python.org/library/email.message.html#email.message.Message.walk

timbo