ansaurus

Question

Parse and display MIME multipart email on website

Answer 1

+1 A:

It doesn't sound like a difficult job to me:

use Email::MIME;
my $parsed = Email::MIME->new($message);
my @parts = $parsed->parts; # These will be Email::MIME objects, too.
print <<EOF;
<html><head><title>!</title></head><body>
EOF
for my $part (@parts) {    
    my $content_type = $parsed->content_type;
    if ($content_type eq "text/plain") {
         print "<pre>", $part->body (), "</pre>\n";
    }
    elsif ($content_type eq "text/html") {
        print $part->body ();
    }        
    # Handle some more cases here
}
print <<EOF;
</body></html>
EOF

Kinopiko 2010-06-18 12:10:07

You will need to sanitize it though. You don't want to let an email inject arbitrary JS into your site!

David Dorward 2010-06-18 12:15:44

Not to mention encoding entities in a text/plain part.

cjm 2010-06-18 15:16:47

Still doesn't sound like a big job to me. Entities are this tough: `s/([<>"/ge` and HTML sanitizing via `HTML::Scrubber`.

Kinopiko 2010-06-18 15:23:13

That's what I'm planning to do, but it's the "Handle some more cases here" that I'm worried about. I don't know enough about MIME multipart and all the different types to not screw it up. I want to display attachments as paper clips etc... But perhaps I'm being overly paranoid... Thanks for your help.

aidan 2010-06-18 16:23:46

Answer 2

+1 A:

Reuse existing complete software. The MHonArc mail-to-HTML converter has excellent MIME support.

daxim 2010-06-18 12:22:34

This might be exactly what I'm looking for actually. Will investigate... Thanks.

aidan 2010-06-18 16:28:10

Answer 3

+2 A:

I actually just dealt with this problem just a few months ago. I added an email feature to the product I work for, both sending and receiving. The first part was sending reminders to users, but we didn't want to manage the bounce backs for our customer admins, we decided to have a message inbox that the admins could see bounces and replies without us, and the admins can deal with adjusting email addresses if they needed to.

Because of this, we accept all email that is sent to an inbox we watch. We use VERP to associate an email with a user, and store the entire email as is in the database. Then, when the admin requests to see the email, we have to parse the email.

My first attempt was very similar to an earlier answer. If one of the parts is html, show it. If it's text, show it. Otherwise, show the original, raw email. This broke down real fast with a few emails not generated by sendmail. Outlook, Exchange, and a few other email systems don't do that, they use multiparts to send the email. After a lot of digging and cussing, I discovered that the problem doesn't appear to be well documented. With the help of looking through MHonArc and reading the RFC's (RFC2045 and RFC2046), I settled on the solution below. I decided on not using MHonArc, since I couldn't easily resuse the parsing and display functionality. I wouldn't say this is perfect, but it's been good enough that we used it.

First, take the message and use Email::MIME to parse it. Then call a function called get_part with the array of parts Email::MIME gives you with ->parts().

get_part, for each part it was passed, decodes the content type, looks it up in a hash, and if it exists, call the function associated with that content type. If the decoder was able to give us something, put it on a result array.

The last piece of the puzzle is this decoder array. Basically, it defines the content types I can deal with:

text/html
text/plain
message/delivery-status, which is actually also plain text
multipart/mixed
multipart/related
multipart/alternative

The non-multipart sections I return as is. With mixed, related and alternative, I merely call get_parts on that MIME node and returns the results. Because alternative is special, it has some extra code after calling get_parts. It will only return html if it has an html part, or it will return only the text part of it has a text part. If it has neither, it won't return anything valid.

The advantage with the hash of valid content types is that I can easily add logic for more parts as needed. And by the time you get_parts is done, you should have an array of all content you care about.

One more item I should mention. As a part of this, we created a separate domain that actually serves these messages. The main domain that an admin works on will refuse to serve the message and redirect the browser to our user content domain. This second domain will only serve user content. This is to help the browser properly sandbox the content away from our main domain. See same origin policy (http://en.wikipedia.org/wiki/Same_origin_policy)

atrodo 2010-06-18 20:36:36

ansaurus

tags:

views:

answers:

Parse and display MIME multipart email on website

related questions