views:

90

answers:

2

I use the following code to generate a spam report using SpamAssassin:

use Mail::SpamAssassin;

my $sa = Mail::SpamAssassin->new();

open FILE, "<", "mail.txt";
my @lines = <FILE>;
my $mail = $sa->parse(@lines);

my $status = $sa->check($mail);

my $report = $status->get_report();
$report =~ s/\n/\n<br>/g;

print "<h1>Spam Report</h1>";
print $report;

$status->finish();
$mail->finish();
$sa->finish();

The problem I have is that it classifies 'sample-nonspam.txt' as spam:

Content preview: [...] 

Content analysis details: (6.9 points, 5.0 required) 

pts rule name description 
---- ---------------------- -------------------------------------------------- 
-0.0 NO_RELAYS Informational: message was not relayed via SMTP 
1.2 MISSING_HEADERS Missing To: header 
0.1 MISSING_MID Missing Message-Id: header 
1.8 MISSING_SUBJECT Missing Subject: header 
2.3 EMPTY_MESSAGE Message appears to have no textual parts and no 
Subject: text 
-0.0 NO_RECEIVED Informational: message has no Received headers 
1.4 MISSING_DATE Missing Date: header 
0.0 NO_HEADERS_MESSAGE Message appears to be missing most RFC-822 headers 

And that information -is- in the file. What worries me is that in the documentation, it states "Parse will return a Mail::SpamAssassin::Message object with just the headers parsed.". Does that mean it will not return a full message?

Kind regards,

Matthias Vance

A: 

You're missing a single character:

my $mail = $sa->parse(\@lines);

From the docs (with emphasis added):

parse($message, $parse_now [, $suppl_attrib])

Parse will return a Mail::SpamAssassin::Message object with just the headers parsed. When calling this function, there are two optional parameters that can be passed in: $message is either undef (which will use STDIN), a scalar of the entire message, an array reference of the message with 1 line per array element, or a file glob which holds the entire contents of the message; and $parse_now, which specifies whether or not to create the MIME tree at parse time or later as necessary.

With the change above, I get the following output (HTML stripped):

 pts rule name              description
---- ---------------------- --------------------------------------------------
-2.6 BAYES_00               BODY: Bayesian spam probability is 0 to 1%
                            [score: 0.0000]

As the docs mention, parse is flexible. You could instead use

my $mail = $sa->parse(join "" => <FILE>);  # scalar of the entire message

or

my $mail = $sa->parse(\*FILE);             # a file glob with the entire contents

or

my $mail;
{ local $/; $mail = $sa->parse(<FILE>) }   # scalar of the entire message

or even

open STDIN, "<", "mail.txt" or die "$0: open: $!";
my $mail = $sa->parse(undef);              # undef means read STDIN

You'd remove my @lines = <FILE> for these last four examples to function as expected.

Greg Bacon
The line you highlighted is about *input*, the line I highlighted was about *output*. The way to do it is slightly different.
Matthias Vance
@Matthias Maybe I should have emphasized it in my answer, but I was able to reproduce the problem you were seeing and fixed it (*i.e.*, successfully got a good report that didn't complain about missing header fields) by passing a *reference to* @lines rather @lines.
Greg Bacon
I think both our answers are possible, but you answered my direct question, therefore I'm accepting it. Thank you!
Matthias Vance
A: 

This is the right way to construct a Message:

my $mail = Mail::SpamAssassin::Message->new({ "message" => $content });
Matthias Vance