views:

462

answers:

3

I have a rails application which processes incoming emails via IMAP. Currently a method is used that searches the parts of a TMail object for a given content_type:

def self.search_parts_for_content_type(parts, content_type = 'text/html')
    parts.each do |part|
      if part.content_type == content_type
        return part.body
      else
        if part.multipart?
          if body = self.search_parts_for_content_type(part.parts, content_type)
            return body
          end
        end
      end
    end

    return false
 end

These emails are generally in response to a html email it sent out in the first place. (The original outbound email is never the same.) The body text the method above returns contains the full history of the email and I would like to just parse out the reply text.

  1. I'm wondering whether it's reasonable to place some '---please reply above this line---' text at the top of the mail as I have seen in a 37 signals application.

  2. Is there another way to ignore the client specific additions to the email, other than write a multitude of regular expressions (which I haven't yet attempted) for each and every mail client? They all seem to tack on their own bit at the top of any replies.

A: 

I think you will be stuck on this one. I have been doing some stuff with emails myself in TMail recently, and what you will generally find is that an email that has an HTML part is generally structured like:

part 1 - multipart/mixed
  sub part 1 - text/plain
  sub part 2 - text/html
end

The email clients I have played with Outlook and Gmail both generate replies in this format, and they just generally quote the original email inline in the reply. At first I though that the 'old' parts of the original email would be separate parts, but they are actually not - the old part is just merged into the reply part.

You could search the part for a line that begins 'From: ' (as most clients generally place a header at the top of the original email text detailing who sent it etc), but its probably not guaranteed.

I don't really see anything wrong with a --- please reply above this line --- generally, its not that invasive, and could make things a lot simpler.

Stephen ODonnell
Thanks for your reply, I've had a play with a few variations on incoming emails, including emails with attachments. I've found the same setup as you mentioned. As you say it seems nothing is guaranteed. It seems silly to me that pattern matching like this is the only way to go on. Even with the --- reply above here --- line you still need to handle the email client specifics, as that naturally goes above the line anyway :(
tsdbrown
+1  A: 

I have to do email reply parsing on a project I'm working on right now. I ended up using pattern matching to identify the response part, so users wouldn't have to worry about where to insert their reply.

The good news is that the implementation really isn't too difficult. The hard part is just testing all the different email clients and services you want to support and figuring out how to identify each one. Generally, you can use either the message ID or the X-Mailer or Return-Path header to determine where an incoming email came from.

Here's a method that takes a TMail object and extracts the response part of the message and returns that along with the email client/service it was sent from. It assumes you have the original message's From: name and address in the constants FROM_NAME and FROM_ADDRESS.

def find_reply(email)
  message_id = email.message_id('')
  x_mailer = email.header_string('x-mailer')

  # For optimization, this list could be sorted from most popular to least popular email client/service
  rules = [
    [ 'Gmail', lambda { message_id =~ /.+gmail\.com>\z/}, /^.*#{FROM_NAME}\s+<#{FROM_ADDRESS}>\s*wrote:.*$/ ],
    [ 'Yahoo! Mail', lambda { message_id =~ /.+yahoo\.com>\z/}, /^_+\nFrom: #{FROM_NAME} <#{FROM_ADDRESS}>$/ ],
    [ 'Microsoft Live Mail/Hotmail', lambda { email.header_string('return-path') =~ /<.+@(hotmail|live).com>/}, /^Date:.+\nSubject:.+\nFrom: #{FROM_ADDRESS}$/ ],
    [ 'Outlook Express', lambda { x_mailer =~ /Microsoft Outlook Express/ }, /^----- Original Message -----$/ ],
    [ 'Outlook', lambda { x_mailer =~ /Microsoft Office Outlook/ }, /^\s*_+\s*\nFrom: #{FROM_NAME}.*$/ ],

    # TODO: other email clients/services

    # Generic fallback
    [ nil, lambda { true }, /^.*#{FROM_ADDRESS}.*$/ ]
  ]

  # Default to using the whole body as the reply (maybe the user deleted the original message when they replied?)
  notes = email.body
  source = nil

  # Try to detect which email service/client sent this message
  rules.find do |r|
    if r[1].call
      # Try to extract the reply.  If we find it, save it and cancel the search.
      reply_match = email.body.match(r[2])
      if reply_match
        notes = email.body[0, reply_match.begin(0)]
        source = r[0]
        next true
      end
    end
  end

  [notes.strip, source]
end
Keith Platfoot
A: 

Check http://pushreply.com. There is no need to write complicated email messages parsing code. The reply notifications are sent as HTTP POST requests.

Disclaimer: I'm the creator of this service.

Andrei Savu