views:

112

answers:

1

I've been playing around with Python's imaplib and email module recently. I tried sending and receiving large emails (with most of the data in the body of the email rather than attachments) using the imaplib/email modules.

However, I've noticed a problem when I download large emails (of size greater than 8MB or so) from the email server and format them using the "email.message_from_string()" method. The time taken by that method seems to take a really long time (average of around 300-310 seconds for a 16 MB email). Note: Sending such a large email doesn't take too much time, about 40 seconds approximately. Again, all the data is in the body of the email -- not in the attachments. If I download the same email with all the data as attachments, the entire operation finishes in 30-40 seconds. This is what I'm doing:

buf = []
t, d = mailacct.search(None, 'SUBJECT', subj)
for num in d:
    t, msg = mailacct.fetch(num, '(RFC822)')

    for resp in msg:
        if isinstance(resp, tuple):
            buf.append(email.message_from_string(resp[1])

I've timed each part of the code separately. mailacct.search and mailacct.fetch both finish in about 30-40 seconds for a 16 MB email. The line with email.message_from_string(resp[1]) takes around 280-300 seconds.

I'm a python noob. So am I doing something really inefficient in the above code? Or does the problem lie with the email.message_from_string() method, perhaps an inefficient implementation? Or could it be that email bodies were never meant to contain large amounts of data, and hence the poor performance?

* EDIT *: Additional info: I used imaplib.IMAP4_SSL for creating IMAP connections. I used imaplib.append() to upload messages to the email account first. I used randomly generated binary data for the payload.

A: 

Okay, I did some digging on my own by examining the source code for the email module. The parsing function (parse()) in email/parser.py is the function which actually processes the email message when email.message_from_string() is called. It seems to parse strings in blocks of 8192 bytes which is why it takes so long for large data. I changed the code so that it read and processed the whole string at once and there was a tremendous improvement in the time taken to process the large email message.

I'm assuming it was initially set to process strings in blocks of 8192 to handle really really large strings? Is there a better way to do this rather than changing the email module source code?

Jagan Srinivasan