views:

618

answers:

3

Hello,

I am using the program below to sort and eventually print out email messages. Some messages may contain attachments or HTML code, which would not be good for printing. Is there an easy way to strip attachments and strip HTML but not the text formatted by HTML from the messages?

#!/usr/bin/perl
use warnings;
use strict;
use Mail::Box::Manager;

open (MYFILE, '>>data.txt');
binmode(MYFILE, ':encoding(UTF-8)');


my $file = shift || $ENV{MAIL};
my $mgr = Mail::Box::Manager->new(
    access          => 'r',
);

my $folder = $mgr->open( folder => $file )
or die "$file: Unable to open: $!\n";

for my $msg ( sort { $a->timestamp <=> $b->timestamp } $folder->messages)
{
    my $to          = join( ', ', map { $_->format } $msg->to );
    my $from        = join( ', ', map { $_->format } $msg->from );
    my $date        = localtime( $msg->timestamp );
    my $subject     = $msg->subject;
    my $body        = $msg->decoded->string;

    # Strip all quoted text
    $body =~ s/^>.*$//msg;

    print MYFILE <<"";
From: $from
To: $to
Date: $date
Subject: $subject
\n
$body

}
+1  A: 

The stripping-HTML aspect is explained in FAQ #9 (or the first item from perldoc -q html). Briefly, the relevant modules are HTML::Parser and HTML::FormatText.

As for the attachments, emails with attachments are sent as MIME. From this example, you can see that the format is simple enough that you could come up with a solution fairly easily, or examine the MIME modules at CPAN.

Cirno de Bergerac
+3  A: 

Mail::Message::isMultipart will tell you whether a given message has any attachments. Mail::Message::parts will give you a list of the mail parts.

Thus:

if ( $msg->isMultipart ) {
    foreach my $part ( $msg->parts ) {
        if ( $part->contentType eq 'text/html' ) {
           # deal with html here.
        }
        elsif ( $part->contentType eq 'text/plain' ) {
           # deal with text here.
        }
        else {
           # well?
        }
    }
}
innaM
A: 

It looks like someone has already solved this on the linuxquestions forum.

From the forum:

            # This is part of Mail::POP3Client to get the headers and body of the POP3 mail in question
            $body = $connection->HeadAndBody($i);
            # Parse the message with MIME::Parser, declare the body as an entitty
            $msg = $parser->parse_data($body);
            # Find out if this is a multipart MIME message or just a plaintext
            $num_parts=$msg->parts;
            # So its its got 0 parts i.e. is a plaintext
            if ($num_parts eq 0) {
            # Get the message by POP3Client
            $message = $connection->Body($i);
            # Use this series of regular expressions to verify that its ok for MySQL
            $message =~ s/</&lt;/g;
            $message =~ s/>/&gt;/g;
            $message =~ s/'//g;
                                  }
            else {
                  # If it is MIME the parse the first part (the plaintext) into a string
                 $message = $msg->parts(0)->bodyhandle->as_string;
                  }
Colin Pickard
Could you repair the link to linuxquestions.org?
innaM
meh. if i could type i'd be dangerous. fixed.
Colin Pickard