I've been playing around with Python's imaplib and email module recently. I tried sending and receiving large emails (with most of the data in the body of the email rather than attachments) using the imaplib/email modules.
However, I've noticed a problem when I download large emails (of size greater than 8MB or so) from the email server and format them using the "email.message_from_string()" method. The time taken by that method seems to take a really long time (average of around 300-310 seconds for a 16 MB email). Note: Sending such a large email doesn't take too much time, about 40 seconds approximately. Again, all the data is in the body of the email -- not in the attachments. If I download the same email with all the data as attachments, the entire operation finishes in 30-40 seconds. This is what I'm doing:
buf = []
t, d = mailacct.search(None, 'SUBJECT', subj)
for num in d:
t, msg = mailacct.fetch(num, '(RFC822)')
for resp in msg:
if isinstance(resp, tuple):
buf.append(email.message_from_string(resp[1])
I've timed each part of the code separately. mailacct.search and mailacct.fetch both finish in about 30-40 seconds for a 16 MB email. The line with email.message_from_string(resp[1]) takes around 280-300 seconds.
I'm a python noob. So am I doing something really inefficient in the above code? Or does the problem lie with the email.message_from_string() method, perhaps an inefficient implementation? Or could it be that email bodies were never meant to contain large amounts of data, and hence the poor performance?
* EDIT *: Additional info: I used imaplib.IMAP4_SSL for creating IMAP connections. I used imaplib.append() to upload messages to the email account first. I used randomly generated binary data for the payload.