tags:

views:

221

answers:

1

I have a small application that process emails as downloaded from a imap-server with fetchmail. The processing consists of finding base64-encoded attachments with a XML-file inside.

Here is the code (somewhat stripped):

def extract_data_from_mailfile(mailfile)
   begin
      mail = TMail::Mail.load(mailfile)
   rescue
      return nil
   end

   bodies_found = []
   if mail.multipart? then
     mail.parts.each do |m|
       bodies_found << m.body
     end
   end

   ## Let's parse the parts we found in the mail to see if one of them
   ## looks XML-ish. Hacky but works for now.
   ## was XML.
   bodies_found.each do |body|
     if body =~ /^<\?XML /i then
       return body
     end
   end
   return nil # Nothing found.
 end

This works great, but on large XML-files (typically >600k mailfiles), this breaks.

>> mail.parts[1].body.size 
=> 487424    <-- should have been larger - doesn't include the end of the file

Base64-decoding doesn't happen automatically either. But this is when I try to run decode manually:

>> Base64::decode64(mail.parts[1].body)
[...] ll="SMTP"></Sendt><Sendt"

That's part of the XML-file, but it has been clipped.

Any way to get the entire attachment? any tips?

A: 

I see your code breaks out the loop at the first found XML fragment. Perhaps the larger messages divide their XML into smaller chunks inside the same multi-part MIME message? You would then return an array of bodies and concat them

mail.parts[1].body[0] + mail.parts[1].body[1]

(PS. It's a long shot, I haven't tried this)

Felix Ogg